Voice Recognition and Text To Speech


New Member
Licensed User
We're building an app that has the following requirements:

1.- The app speaks a short phrase, about 5 words.
2.- The app waits for the user to speak a specific number coming form another source. This wait might be several minutes long. If the number is correct, we go to the next step. If not, we repeat the original phrase.
3.- The app speaks a number. It should then wait for the user to repeat this number. If correct, we go to the next step. If not, the app repeats the number.
4.- Here we have a long wait again until the user speaks "Ready". Then we go back to step 1.

We've built a few test apps based on the many voice recognition examples here on the forum. We've run into several problems:

1.- The voice recognition is not reliable and it seems to be mostly related to the timer to start and stop the voice recognition. If the user speaks in the middle of this cycle, the app doesn't recognize reliably. We're getting about a 40% hit rate. That is way too low.
2.- The Text To Speech cuts off constantly.
3.- We've used the suggested code to suppress beeps coming from Android, but some still get through. This is a relatively minor issue compared to the 2 above.

We're interested in hiring someone who can resolve these issues and build a sample application that can do what's described here. The app must work offline.

Please get in touch if you can build this app.

Marcus Araujo

Licensed User

Perhaps after the app waits for the user to speak, you could enable two timers (Timer1 = 1 second, Timer 2 = 15 seconds) and start recording the sound into a variable/stream.

Timer1 could be used to analyse what is recorded so far (the stream): checking if the user successfully started speaking and stopped speaking.
Timer2 could be used as a timeout timer - if 15 seconds passed, the recording would necessarily stop and the remaining audio would be transcribed.

If the user successfully spoke (during the analysis in Timer1's tick), then you would stop Timer2 and transcribe the text.

Voice recognition should only be ran during the phase of text transcription.
I think that should address issues 1 and 2.