Android Question Continuous Real-time Speech Recognition

canalrun

Well-Known Member
Licensed User
Longtime User
Hello,
I currently have an app that works using two phones. Person one presses a button, speaks into phone one, the speech is recognized and displayed as text, person one then presses a Send button and the text is sent to the second phone.

I use this as an aid when speaking to the Deaf.

The ultimate solution would be to dispense with the Send button and allow continuous, real-time speech to be recognized and transmitted.

I think this should be possible. Has anyone done this or can anyone point me in the right direction?

I envision the following software algorithm scenario.
  • The person using phone 1 presses a Start analog capture button to begin capturing buffers of analog speech data.
  • I think Android returns interim buffers, say a few seconds worth of analog data along with the current amplitude as one of its callback events, while continuing the analog capture.
  • Each interim analog data buffer would be sent off to Google for speech recognition.
  • The second analog data buffer would probably be available before the first results are returned, but it is also sent off to Google.
  • As the results are received from Google they are displayed as text and sent to the second phone.
  • This continues until person one presses the Stop analog capture button.
I'm currently using the "Android Speech Recognition API Wrapper" from stevel05.

It is working well. I'll have to investigate if it can be used in this continuous, real-time mode.

Thanks,
Barry.
 
Last edited:

canalrun

Well-Known Member
Licensed User
Longtime User
I envision the following software algorithm scenario.
  • The person using phone 1 presses a Start analog capture button to begin capturing buffers of analog speech data.
  • I think Android returns interim buffers, say a few seconds worth of analog data along with the current amplitude as one of its callback events, while continuing the analog capture.
  • Each interim analog data buffer would be sent off to Google for speech recognition.
  • The second analog data buffer would probably be available before the first results are returned, but it is also sent off to Google.
  • As the results are received from Google they are displayed as text and sent to the second phone.
  • This continues until person one presses the Stop analog capture button.

Of course, unfortunately, it may not be that simple :(.

I haven't looked very hard, but I didn't see any obvious way to get ongoing, interim audio data buffers from Android.

Google has a Cloud recognition service, or Microsoft also has Bing Speech Recognition. I may have to use one of those.

Can anyone offer any pointers from their experience?

Thanks,
Barry.


Added later:
Beautiful - I just found Erel's B4A AudioStreamer.
 
Last edited:
Upvote 0

canalrun

Well-Known Member
Licensed User
Longtime User
I approach the problem a slightly different way.

Suppose Phone 1 is used by the Hearing person and Phone 2 is used by the Deaf person.

Person 1 speaks into Phone 1. Phone 1 captures the audio and uses some method (Google, Microsoft, or CMUSphinx) to Voice Recognize the audio and return the text to Phone 1. Phone 1 then transmits the text via Wi-Fi or Bluetooth to Phone 2. The Deaf person on Phone 2 could reply by typing text or speaking into his phone and sending this text back to Phone 1.

Right now my problem is at the VR stage. Microsoft Bing and Google offer cloud VR that translates audio and returns text. Both are free for limited use, but could become costly – especially in a real-time translation scenario. I also thought they had a mode that would accept continuous audio packets and return ongoing, interim results but I have not found this again.

Another other VR solution might be based on the CMUSphinx Alpha demo posted by Stevel05. This seems to require a "grammar" that restricts translation to a specific set of words. I have not found any "general translation" grammars.

Barry.
 
Upvote 0

canalrun

Well-Known Member
Licensed User
Longtime User
Yes, thanks. Very helpful.

I also chose Wi-Fi For the same reasons you mention.

True, it does not work over the Internet, but what I have found is that it does work in some stores (for example, grocery stores, hardware stores) where the two people are positioned at opposite corners of the store, way more than 30 feet apart. The IP addresses of both devices are on different subnets, but the in-store routing tables must have routing paths defined between the subnets. It still works.

I found the same thing with Google's cloud solution. With Microsoft Bing they allow 5000 transactions for free, but doing continuous VR will eat that in no time.

Thanks for the CMU Sphinx info. I had just played around with that a little bit. I think the latest version at the CMU website is 5 Alpha. It's probably not quite ready for prime time.

If you are interested in taking a look at the first version of my solution, search Google Play for Deaf Chat. It's from CanalRun. I think Google Play, at least in the USA, gives you a generous amount of time to play with it, if you don't want to keep it.

Barry.
 
Upvote 0
Top