Hello @Erel ,
thanks for fast answering as always. Unfortunately this solution isn't adequate because sometimes the user can't play the sound and wants to read (for example, when you are in a noisy environment or in a meeting).
In other hand, the text acquired from audio could be easily indexed and even "parsed" with a fast view of user (one of my problems in WhatsApp for example is when somebody sends a long audio message with some important pieces in the middle or end of audio).
If this direct conversion is not possible in Android, I'll try to implement a WebService support with some API provider, but I fell that the phone itself has all the CPU power and resources needed to do it locally...