VoiceRecognition - isn't there a way to limit it to a dictionary?

FrankR

Member
Licensed User
Longtime User
I'm looking at the VoiceRecognition module now.

Isn't there a way to limit it to a dictionary?

i.e. If I want the user to say one of Apple, Orange, Banana, Grapefruit,
wouldn't the module be a whole lot more accurate if I gave it the choices
Apple, Orange, Banana, Grapefruit in advance? That would keep nuisance responses like Bandana coming back for Banana. :)

?

There are at least two common uses for this technology:
1. General dictation
2. Data entry for specific choices

#2, to be really usable, needs the use of a limiting dictionary.

Off to go check if this is in Android or not.
 

FrankR

Member
Licensed User
Longtime User
I did some reading on this today. I was pretty amazed at what I learned.
[Folks, these are Android statements, not statements about B4a]

- The current support is online only. It actually communicates to a back end service. No connection, no reco.

- There doesn't appear to be any commercial/viable add-on for doing limited vocabulary offline reco in the Android world.

I'm very surprised by that. I know it takes significant CPU and data/pattern resources to do dictation reco, but for a limited set of commands, I Would have expected something to be available in this environment. Frankly, we have been discussing use of voice reco with customers, and if we can't do offline, limited vocab reco, we have to punt that idea. And stop talking about it. :)

Sounds like an untapped opportunity for a company - to build a limited vocab, local offline voice reco engine for Android.
 
Upvote 0

FrankR

Member
Licensed User
Longtime User
A partial solution to the dictionary issue is to compare the retrieved text to the set of words and choose the closets match.

Yeah - but how do you do a closest match with stuff like this? Unless maybe also try to find a Soundex Library?
 
Upvote 0

nfordbscndrd

Well-Known Member
Licensed User
Longtime User
Yeah - but how do you do a closest match with stuff like this? Unless maybe also try to find a Soundex Library?

I have a Windows (VB6) program for natural language processing I've been working on that finds suggestions for misspelled words. I was converting it to B4A but got off on another project.

If you want a SQLite database with about 125,000 words and their Soundex codes in it. You can find it at http://www.aeyec.com/wordsdb.zip

The routine I used for creating the Soundex codes is a little nonstandard, so if you want to use Soundex to find possible matches in the database, you have to use something like the following VB6 code to compute a Soundex code for the input text:

B4X:
Public Function Soundex(ByVal WordStr As String) As String
    Dim str_Word As String
    Dim i As Long
    Dim c As String
    Dim s As String
    Dim d As String
    Dim f As String
    Dim let1 As String
    Dim wrd As String
    Dim metaph As String
 
    ' 1. Encode the letters, starting with the 2nd letter.    '
    ' 2. If two adjacent letters have the same soundex code,  '
    '      treat them as one.                                 '
    ' 3. If two consonants are separated by a vowel or Y, use '
    '      both consonants.                                   '
    ' 4. If two consonants are separated by H or W, treat     '
    '      them as one. (I.e. rule 3 applies.)                '
 
    str_Word = UCase$(Trim$(WordStr))
 
    If str_Word = "" Then Exit Function
 
    For i = 1 To Len(str_Word)
        c = Mid(str_Word, i, 1)
        f = c
 
        ' Convert non-US characters:
        If InStr("ÀÁÂÃÄÅ«", c) > 0 Then
            f = "A"
        ElseIf c = "Ç" Then
            f = "C"
        ElseIf InStr("ÈÉÊË", c) > 0 Then
            f = "E"
        ElseIf InStr("ÌÍÎÏ", c) > 0 Then
            f = "I"
        ElseIf c = "Ñ" Then
            f = "N"
        ElseIf InStr("ÒÓÔÕÖ", c) > 0 Then
            f = "O"
        ElseIf InStr("ÙÚÛÜ", c) > 0 Then
            f = "U"
        ElseIf c = "Ý" Then
            f = "Y"
        End If
        If c <> f Then Mid$(str_Word, i, 1) = f
 
        ' Get rid of non-alpha characters: '
        ' (e.g.:  o'clock  ->  oclock)     '
        If (Not (c Like "[A-Z]")) Then
            str_Word = Replace(str_Word, c, " ")
        End If
    Next i
 
    str_Word = Replace$(str_Word, " ", "")
 
    If str_Word = "" Then Exit Function
 
    ' Change starting letters to actual sounds: '
    s = Left$(str_Word, 2)
    If s = "PS" Or s = "PN" Or s = "KN" Or s = "GN" Or s = "WR" Then
        str_Word = Mid(str_Word, 2)
    ElseIf Left$(str_Word, 3) = "WHO" Then
        str_Word = "H" & Mid$(str_Word, 3)
    ElseIf s = "WH" Then
        str_Word = "W" & Mid$(str_Word, 3)
    ElseIf s = "PH" Then
        str_Word = "F" & Mid$(str_Word, 3)
    ElseIf Left$(str_Word, 1) = "X" Then
        str_Word = "Z" & Mid$(str_Word, 2)
    End If
 
    ' Metaphone changes: '
    ' When swapping letters for numbers, "c" '
    ' is treated as a sibilant, not as a "k".'
    str_Word = Replace$(str_Word, "STLE", "SEL") ' whistle - the t is silent '
    str_Word = Replace$(str_Word, "SCLE", "SEL") ' muscle - the c is silent '
    str_Word = Replace$(str_Word, "CK", "K")   ' rack, flock - the c is silent '
    str_Word = Replace$(str_Word, "CT", "KT")  ' doctor - the c is hard '
    str_Word = Replace$(str_Word, "SCIE", "SIE") ' science - the c is silent '
    str_Word = Replace$(str_Word, "SCE", "SE") ' scene - the c is silent '
    str_Word = Replace$(str_Word, "SCY", "SY")
    str_Word = Replace$(str_Word, "SC", "SK")  ' scary - the c is hard '
    str_Word = Replace$(str_Word, "DGE", "J")  ' edge '
    str_Word = Replace$(str_Word, "DGY", "JE") ' edgy '
    str_Word = Replace$(str_Word, "DGI", "JI") ' edginess '
    str_Word = Replace$(str_Word, "TIA", "SHA")
    str_Word = Replace$(str_Word, "TIO", "SHO")
    str_Word = Replace$(str_Word, "TCH", "CH")
 
    If Right$(str_Word, 2) = "GN" Then str_Word = Left$(str_Word, Len(str_Word) - 2) & "N"  ' sign '
    If Right$(str_Word, 4) = "GNED" Then str_Word = Left$(str_Word, Len(str_Word) - 4) & "ND"
    If Right$(str_Word, 5) = "GNING" Then str_Word = Left$(str_Word, Len(str_Word) - 5) & "NING"
    If Right$(str_Word, 2) = "IC" Then str_Word = Left$(str_Word, Len(str_Word) - 2) & "IK"  ' rustic, fantastic '
 
    i = InStr(str_Word, "W")
    Do While i > 0
        If InStr("AEIOU", Mid$(str_Word & " ", i + 1, 1)) = 0 Then
            ' W not followed by a vowel is silent. '
            If i > 1 Then
                str_Word = RTrim$(Left$(str_Word, i - 1) & _
                                   Mid$(str_Word & " ", i + 1))
            Else
                str_Word = Mid$(str_Word, 2)
            End If
        End If
        i = InStr(i + 1, str_Word, "W")
    Loop
 
    ' Metaphone says that G is silent in GH, '
    ' but the GH becomes F in LAUGHTER.      '
 
    ' Change letters to codes,  '
    ' starting with 2nd letter: '
    For i = 2 To Len(str_Word)
        d = Mid$(str_Word, i, 1)
        If InStr("AEIOUHWY", d) > 0 Then
            d = "0"  ' zero, not uppercase "o". '
        ElseIf InStr("BFPV", d) > 0 Then
            d = "1"
        ElseIf InStr("GJKQ", d) > 0 Then
            d = "2"
        ElseIf InStr("CSXZ", d) > 0 Then
            ' Split these from last line because '
            ' GJKQ are hard sounds and CSXZ are  '
            ' sibilants ("s" sounding), other    '
            ' that the adjustments to C combos   '
            ' made above.                        '
            d = "7"
        ElseIf InStr("DT", d) > 0 Then
            d = "3"
        ElseIf InStr("L", d) > 0 Then
            d = "4"
        ElseIf InStr("MN", d) > 0 Then
            d = "5"
        ElseIf InStr("R", d) > 0 Then
            d = "6"
        Else
            If d <> "-" Then Stop
            d = ""
        End If
        Mid(str_Word, i, 1) = d
    Next
 
    wrd = ""
    ' Remove repeating codes:  50773 -> 5073 '
    For i = 1 To Len(str_Word)
        If Mid(str_Word, i, 1) <> _
            Mid(str_Word & " ", i + 1, 1) _
        Then
            wrd = wrd + Mid(str_Word, i, 1)
        End If
    Next
    If str_Word = "" Then Exit Function
    ' Get rid of vowels (which have been replaced by 0's): '
    str_Word = Replace(wrd, "0", "")
    str_Word = Left$(str_Word & "0000", 4)
 
    Soundex = str_Word
End Function

I also have VB6 routines for finding suggested spellings of misspelled words, but the routines are too long to post here. I can put them on my web site if you are interested.
 
Upvote 0
Top