This is a wrapper of Acephei VOSK , With this, you can add continuous offline speech recognition feature to your application,

NOTE:
  1. As it works offline the app should be complied with the voice model. It will increase the app size by 30-40Mb.
  2. The accuracy depends on the voice model. You can train your own voice model. For more details check the models download link below.
  3. Remember to add RECORD_AUDIO permission.
How to use:
  1. Download the required voice model from here.
  2. Change the file name to a simple one like "model.zip"
  3. Copy it to the Files folder of your project.
  4. Now to use that model check the attached example.

SpeechToText

Author:
@Biswajit
Version: 1.2
  • SpeechToText
    • Events:
      • Error (message As String)
      • FinalResult (text As String)
      • MicrophoneBuffer (buffer() As Byte) new
      • PartialResult (text As String)
      • Paused (paused As Boolean)
      • ReadyToListen
      • ReadyToRead
      • Restarted
      • Result (text As String)
    • Functions:
      • cancel As Boolean
        Cancel microphone recognition. Do not post any new events, simply cancel processing.
        Does nothing if recognition is not active.
        Return type: @return:true if recognition was actually stopped
      • Initialize (eventName As String, modelPath As String)
        Initialize the object.
        eventName: The event name prefix.
        modelPath: The model folder path.
      • pause (pause As Boolean)
        Pause microphone recognition.
        pause: Pass true to pause and false to continue.
      • prepareAudioFile (audioPath As String, predefinedWords As String)
        Prepare the audio file for recognition. On success Eventname_ReadyToRead event will be raised.
        Call startReading to start reading the file.
        audioPath: Audio file path.
        predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
      • prepareMicrophone (predefinedWords As String)
        Prepare the microphone for listening. On success Eventname_ReadyToListen event will be raised.
        Call startListening to start listening.
        predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
      • reset
        Resets microphone recognizer in a thread, starts microphone recognition over again
      • shutdown
        Shutdown the microphone recognizer and release the recorder.
        Call this on activity or service closing event.
      • startListening (timeout As Int) As Boolean
        Starts microphone recognition. After specified timeout listening stops and the
        endOfSpeech signals about that. Does nothing if recognition is active.
        timeout: timeout in milliseconds to listen. -1 = infinite;
        Return type: @return:true if recognition was actually started
      • startReading (timeout As Int) As Boolean
        Starts file recognition. After specified timeout listening stops and the
        endOfSpeech signals about that. Does nothing if recognition is active.
        timeout: timeout in milliseconds to listen. -1 = infinite;
        Return type: @return:true if recognition was actually started
      • stop As Boolean
        Stops microphone/file recognition. Listener should receive final result if there is
        any. Does nothing if recognition is not active.
        Return type: @return:true if recognition was actually stopped
Downloads:
  1. Library
  2. Example
  3. Voice Model
  4. Test app
Update:
  • Version 1.1:
    1. Added audio file to text functionality. (For now only WAV format is supported)
    2. Added predefined word/phrase detection functionality.
    3. Merged startListening and startListening2 together. Pass -1 for continuous recognition.
  • Version 1.2:
    1. Added MicrophoneBuffer event where you will receive the microphone audio buffer while using voice recognition.

If you like my work, please donate. Your donations will encourage me to add more features in the future.

 
Last edited:
Dear specialists,
I have The serious issue. I have done my best to copy files from folders from GIthub tree of Czech language model. But unfortunately, I Am helpless to find The required model.conf file. I can not simply mix this file from English model with this Czech model. I have contacted The author of Czech model database on Github, but if author will not respond. How can i generate The file with The correct values? Or is it impossible?

Or it is not The rule, that every language model must contain this file?

Here is link to The corresponding Github tree again.

https://github.com/rhasspy/cs_kaldi-rhasspy
I Am very sad, because if author will not provide this file and if file can not be generated, model will be unusable and there is no next model trained on The Internet.
 

Biswajit

Active Member
Licensed User
Dear specialists,
I have The serious issue. I have done my best to copy files from folders from GIthub tree of Czech language model. But unfortunately, I Am helpless to find The required model.conf file. I can not simply mix this file from English model with this Czech model. I have contacted The author of Czech model database on Github, but if author will not respond. How can i generate The file with The correct values? Or is it impossible?

Or it is not The rule, that every language model must contain this file?

Here is link to The corresponding Github tree again.

https://github.com/rhasspy/cs_kaldi-rhasspy
I Am very sad, because if author will not provide this file and if file can not be generated, model will be unusable and there is no next model trained on The Internet.
I dont think the kaldi voice model will work with vosk. I will check.
 
This is a wrapper of Acephei VOSK , With this, you can add continuous offline speech recognition feature to your application,

NOTE:
  1. As it works offline the app should be complied with the voice model. It will increase the app size by 30-40Mb.
  2. The accuracy depends on the voice model. You can train your own voice model. For more details check the models download link below.
  3. Remember to add RECORD_AUDIO permission.
How to use:
  1. Download the required voice model from here.
  2. Change the file name to a simple one like "model.zip"
  3. Copy it to the Files folder of your project.
  4. Now to use that model check the attached example.

SpeechToText

Author:
@Biswajit
Version: 1.2
  • SpeechToText
    • Events:
      • Error (message As String)
      • FinalResult (text As String)
      • MicrophoneBuffer (buffer() As Byte) new
      • PartialResult (text As String)
      • Paused (paused As Boolean)
      • ReadyToListen
      • ReadyToRead
      • Restarted
      • Result (text As String)
    • Functions:
      • cancel As Boolean
        Cancel microphone recognition. Do not post any new events, simply cancel processing.
        Does nothing if recognition is not active.
        Return type: @return:true if recognition was actually stopped
      • Initialize (eventName As String, modelPath As String)
        Initialize the object.
        eventName: The event name prefix.
        modelPath: The model folder path.
      • pause (pause As Boolean)
        Pause microphone recognition.
        pause: Pass true to pause and false to continue.
      • prepareAudioFile (audioPath As String, predefinedWords As String)
        Prepare the audio file for recognition. On success Eventname_ReadyToRead event will be raised.
        Call startReading to start reading the file.
        audioPath: Audio file path.
        predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
      • prepareMicrophone (predefinedWords As String)
        Prepare the microphone for listening. On success Eventname_ReadyToListen event will be raised.
        Call startListening to start listening.
        predefinedWords: Add some predefined words/phrase as JSON string. Can be blank.
      • reset
        Resets microphone recognizer in a thread, starts microphone recognition over again
      • shutdown
        Shutdown the microphone recognizer and release the recorder.
        Call this on activity or service closing event.
      • startListening (timeout As Int) As Boolean
        Starts microphone recognition. After specified timeout listening stops and the
        endOfSpeech signals about that. Does nothing if recognition is active.
        timeout: timeout in milliseconds to listen. -1 = infinite;
        Return type: @return:true if recognition was actually started
      • startReading (timeout As Int) As Boolean
        Starts file recognition. After specified timeout listening stops and the
        endOfSpeech signals about that. Does nothing if recognition is active.
        timeout: timeout in milliseconds to listen. -1 = infinite;
        Return type: @return:true if recognition was actually started
      • stop As Boolean
        Stops microphone/file recognition. Listener should receive final result if there is
        any. Does nothing if recognition is not active.
        Return type: @return:true if recognition was actually stopped
Downloads:
  1. Library
  2. Example
  3. Voice Model
  4. Test app
Update:
  • Version 1.1:
    1. Added audio file to text functionality. (For now only WAV format is supported)
    2. Added predefined word/phrase detection functionality.
    3. Merged startListening and startListening2 together. Pass -1 for continuous recognition.
  • Version 1.2:
    1. Added MicrophoneBuffer event where you will receive the microphone audio buffer while using voice recognition.

If you like my work, please donate. Your donations will encourage me to add more features in the future.

The author of recognition engine has provided The linkf for Me. May be, that some files must be renamed and that model.conf has The different name. Other files are here. It will be The surprice. Thank you for yours time and for yours analysis.
 

gezueb

Active Member
Licensed User
Dialog: I would like to program a dialog with questions from device and answer by user. The device uses the Text to Speech (TTS library) for the questions. When the device speaks, the VOSK speech recognition must be paused because otherwise the voice output of the device is wrongly recognized as an answer of the user. There are several functions to disable recognition available, pause(false or true) , stop and cancel, but I am a bit confused how to use them to stop and restart recognition in a timely sequence. A further problem is that the TextToSpeech library TTS creates no event when the text is actually completely spoken (queue empty). I had to use sleep (something) so far. Thanks for advice!
 

Biswajit

Active Member
Licensed User
Dialog: I would like to program a dialog with questions from device and answer by user. The device uses the Text to Speech (TTS library) for the questions. When the device speaks, the VOSK speech recognition must be paused because otherwise the voice output of the device is wrongly recognized as an answer of the user. There are several functions to disable recognition available, pause(false or true) , stop and cancel, but I am a bit confused how to use them to stop and restart recognition in a timely sequence. A further problem is that the TextToSpeech library TTS creates no event when the text is actually completely spoken (queue empty). I had to use sleep (something) so far. Thanks for advice!
It's simple just check if TTS is still speaking or not. Check the below example.
B4X:
Sub Activity_Create(FirstTime As Boolean)
    Activity.LoadLayout("Layout")
    tts.Initialize("tts")
    t.Initialize("timer",100)
End Sub

Sub Button1_Click
    tts.Speak("your text",True)
    t.Enabled = True
End Sub

Sub timer_Tick
    If Not(tts.As(JavaObject).RunMethod("isSpeaking",Null)) Then
        t.Enabled = False
        'tts done now you can run speech recognition
    End If
End Sub
 

gezueb

Active Member
Licensed User
I would just like to add: the example uses async methods to copy and unzip files. While this is certainly ok, it requires to handle the completion of all async functions properly in a resumable sub with wait fors. I find it much safer to use the blocking versions. The loading of the model can then be coded in a normal sub - not in a resumable one - which makes the code flow independant of the devices performance.
 
Top