This is a wrapper of Acephei VOSK , With this, you can add continuous offline speech recognition feature to your application,

NOTE:
  1. As it works offline the app should be complied with the voice model. It will increase the app size by 30-40Mb.
  2. The accuracy depends on the voice model. You can train your own voice model. For more details check the models download link below.
  3. Remember to add RECORD_AUDIO permission.
How to use:
  1. Download the required voice model from here.
  2. Change the file name to a simple one like "model.zip"
  3. Copy it to the Files folder of your project.
  4. Now to use that model check the attached example.

SpeechToText

Author:
@Biswajit
Version: 1.2
  • SpeechToText
    • Events:
      • Error (message As String)
      • FinalResult (text As String)
      • MicrophoneBuffer (buffer() As Byte) new
      • PartialResult (text As String)
      • Paused (paused As Boolean)
      • ReadyToListen
      • ReadyToRead
      • Restarted
      • Result (text As String)
    • Functions:
      • cancel As Boolean
        Cancel microphone recognition. Do not post any new events, simply cancel processing.
        Does nothing if recognition is not active.
        Return type: @return:true if recognition was actually stopped
      • Initialize (eventName As String, modelPath As String)
        Initialize the object.
        eventName: The event name prefix.
        modelPath: The model folder path.
      • pause (pause As Boolean)
        Pause microphone recognition.
        pause: Pass true to pause and false to continue.
      • prepareAudioFile (audioPath As String, predefinedWords As String)
        Prepare the audio file for recognition. On success Eventname_ReadyToRead event will be raised.
        Call startReading to start reading the file.
        audioPath: Audio file path.
        predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
      • prepareMicrophone (predefinedWords As String)
        Prepare the microphone for listening. On success Eventname_ReadyToListen event will be raised.
        Call startListening to start listening.
        predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
      • reset
        Resets microphone recognizer in a thread, starts microphone recognition over again
      • shutdown
        Shutdown the microphone recognizer and release the recorder.
        Call this on activity or service closing event.
      • startListening (timeout As Int) As Boolean
        Starts microphone recognition. After specified timeout listening stops and the
        endOfSpeech signals about that. Does nothing if recognition is active.
        timeout: timeout in milliseconds to listen. -1 = infinite;
        Return type: @return:true if recognition was actually started
      • startReading (timeout As Int) As Boolean
        Starts file recognition. After specified timeout listening stops and the
        endOfSpeech signals about that. Does nothing if recognition is active.
        timeout: timeout in milliseconds to listen. -1 = infinite;
        Return type: @return:true if recognition was actually started
      • stop As Boolean
        Stops microphone/file recognition. Listener should receive final result if there is
        any. Does nothing if recognition is not active.
        Return type: @return:true if recognition was actually stopped
Downloads:
  1. Library
  2. Example
  3. Voice Model
  4. Test app
Update:
  • Version 1.1:
    1. Added audio file to text functionality. (For now only WAV format is supported)
    2. Added predefined word/phrase detection functionality.
    3. Merged startListening and startListening2 together. Pass -1 for continuous recognition.
  • Version 1.2:
    1. Added MicrophoneBuffer event where you will receive the microphone audio buffer while using voice recognition.

If you like my work, please donate. Your donations will encourage me to add more features in the future.

 
Last edited:

JohnC

Expert
Licensed User
Do you know if there is any way to tap into the microphone audio data buffer so that an audio recording can be made at the same time as this speech-to-text is working?

Or if this simultaneous operation (record + SR) is not possible, is there a way to pass the voice data from audio recording file into this SR engine so that the speech-to-text can be performed from an audio recording?
 

Dave O

Well-Known Member
Licensed User
I'm also looking to capture both the original audio and the resulting text, for a dictation app I'm working on.

Normally the text is good enough to recognise (later) what was said, but sometimes it's really wrong, so the user could ideally play back the audio recording to hear what they really said.
 

Biswajit

Active Member
Licensed User
Yes. Thats the next item in my todo list. I am trying to capture the audio data and also trying to use speaker audio for voice recognition. I will post the update once done.
 

JohnC

Expert
Licensed User
Yes. Thats the next item in my todo list. I am trying to capture the audio data and also trying to use speaker audio for voice recognition. I will post the update once done.
If you can get both text and audio working, I would definitely make a nice donation :)
 

AnandGupta

Well-Known Member
Licensed User
@Biswajit your works are really inspirational.

My knowledge is primitive comparatively, and I am lucky that I am a member here and reaping all the great fruits of masters hard work.
Thanks for your continuous contributions.
 

Biswajit

Active Member
Licensed User
@JohnC @Dave O Check the latest update. You can download the test app to check the functionality. Also check the example for audio recording.

Version 1.2: Added MicrophoneBuffer event where you will receive the microphone audio buffer while using voice recognition.
 
Dear MR Biswajit,
The elite programmers like you are are always The needed kind of people in such projects like B4A language is. YOurs perfect and deep Java and Android programmers knowledge allow you to bring users of this forum excellent various B4A libraryes. Libraryes, which are always very stable, low memory consumption and easy to operate. Very very well done. I have decided to try yours example and I will do my best to deeply familiar with yours library. Unfortunately, Czech voice model is not supported, but I have opened The issue on Github so I will try to create one. It is very patient and difficult job but I love such activities. Thank you for yours libraryes again and this wrapper is really outstanding. You have shown B4A users, that it is even possible to call native .so libraryes from B4A language. SWith no need to include them to .jar file. So very well done again.
 
I Am experiencing strange error from B4A compiler. Line of yours example
Dim ar As Archiver

produces error. And I have all required libraryes available. I AM using latest stable B4A available from The download site.
It is very strange why this data type can not be assigned.
 
Thank you for yours help. I Am very sorry, I have not found a reference on it inside .b4A example file. It is wonderful, that this library is available. Thank you again.
 
You are The most friendly developers community which I have ever found on my life. And thanks to yours useful answer, I can compile The example. And now I will test it with latest available Czech language speech model. And I Am sending link to download this model for all of you who would want to use it.

https://github.com/rhasspy/cs_kaldi-rhasspy
I Am very nicely surpriced because of The fact, that every one can use not only GOogle search. And I Am very glad, that voice recognition of vosk is so fast and reliable. Sure, very well done must be addressed to The author of The library for B4A. I Am very glad, that I can use this voice model because thanks to green deal it will be very necessary to migrate from Notebook to Android mobile device, where AC adapters should not take more than 40 Watts per hours while charging.
 

gezueb

Active Member
Licensed User
Maybe of help: The speech model is copied by the example app into the DirInternal folder on the device. The Dirinternal files are left unchanged when you recompile and update an existing app. If you change the model, you will have to un-install the app on the device before recompilation, otherwise the model will not change and the code will use the old model. Using one of the speech models downloaded from the vosk page requires some steps to rename the model directories. Windows Explorer allows to rename a zipped directory, but in order to change the folder called "model_zip_name", the whole structure must be unzipped into a temp file, then renamed, then zipped and copied back into the DirAsset folder with a tool like 7zip or the like. And last not least: Android Java treats filenames case sensitive and not all characters that are acceptable in the Windows filesystem are allowed in Android Java. So simple directory- and file-names like "model" are recommended, Good luck!
 
Thank you for yours excellent advices. Sure I also know, that may be, that I will have to modify The included source code of The example app to allow it to support Czech voice model. I Am very nicely surprised because of voice recognition speed. I will test Czech voice model. The best results are when I AM using my headset with A microphone. I AM also very lucky, that I have The mobile phone with 8 cores CPU and with 4 GB of RAM.
 

Derek Johnson

Active Member
Licensed User
I've tried this and it is very good. One thing that I have noticed is that numbers are spelt out i.e one, two three twelve etc. Is there a way to change this especially for 0-9?

I suppose I could do a find/replace after the text has arrived.
 
Last edited:

Biswajit

Active Member
Licensed User
I've tried this and it is very good. One thing that I have noticed is that numbers are spelt out i.e one, two three twelve etc. Is there a way to change this especially for 0-9?

I suppose I could do a find/replace after the text has arrived.
Not sure but try to send 0-9 json string as predefinedWords parameter to the prepareMicrophone method

I just checked the official website and seems its not possible. you have to use your own method to replace those words with numeric digits.
 
Last edited:

Derek Johnson

Active Member
Licensed User
Not sure but try to send 0-9 json string as predefinedWords parameter to the prepareMicrophone method

I just checked the official website and seems its not possible. you have to use your own method to replace those words with numeric digits.
I fixed my problem with numbers by doing some processing of the resulting text. I noticed that you originally referred to a prepareMicrophone method. That sounds very useful if it would give preference to certain words.

This format was accepted :

Dim jsontext As String =$"["oh","zero","one","two","three","four","five","six","seven","eight","nine","ten","eleven","twelve","thirty","fourty","fifty","sixty","seventy","eighty","ninety"]"$
STT.prepareMicrophone(jsontext)
 
Last edited:
Top