Anagram solving by Regex or SQL

RichardN

Well-Known Member
Licensed User
Longtime User
Here is a brainteaser for any B4X insommiacs out there.....

I have a language database in SQLite with around 30k words. Given a string comprising 4 to 12 characters, is it possible to devise an SQL query that extracts all matching words for every possible permutation of those exact characters.

I started writing a query in SQLlite Expert Pro but it appears to be an awful lot more complicated than you might imagine.

Any experiences to share?
 

JohnC

Expert
Licensed User
Longtime User
ChatGPT says...

Creating an SQL query to match all permutations of a given string against a database of words in SQLite is not straightforward because SQL is not inherently designed for combinatorial tasks like generating permutations. However, it can be approached by combining SQL with B4X code (e.g., B4A or B4i) for pre-processing.

Here's a plan:

  1. Generate Permutations in B4X Code: Use B4X to generate all permutations of the input string.
  2. Store Permutations in a Temporary Table: Insert the permutations into a temporary table in SQLite.
  3. Query the Database for Matches: Use a JOIN or IN clause to find matches in the language database.

Step-by-Step Solution:

1. Generate Permutations in B4X

Use a function in B4X to generate all permutations of a given string. Here's an example function in B4X:
B4X:
Sub GeneratePermutations(s As String) As List
    Dim res As List
    res.Initialize
    If s.Length = 1 Then
        res.Add(s)
        Return res
    End If
    For i = 0 To s.Length - 1
        Dim ch As String = s.CharAt(i)
        Dim rem As String = s.SubString2(0, i) & s.SubString(i + 1)
        Dim perms As List = GeneratePermutations(rem)
        For Each perm As String In perms
            res.Add(ch & perm)
        Next
    Next
    Return res
End Sub

2. Insert Permutations into a Temporary Table

Create a temporary table in your SQLite database to hold the permutations:
B4X:
CREATE TEMP TABLE IF NOT EXISTS temp_permutations (word TEXT);
DELETE FROM temp_permutations;
Insert the permutations into this table using B4X:
B4X:
Dim permutations As List = GeneratePermutations("yourstring")
For Each perm As String In permutations
    DB.ExecNonQuery2("INSERT INTO temp_permutations (word) VALUES (?)", Array As Object(perm))
Next

3. Query the Language Database

Join the temporary table with your language database to find matches:

B4X:
SELECT lang.word
FROM language_table lang
JOIN temp_permutations temp ON lang.word = temp.word;

Full Example in B4X

B4X:
Sub GeneratePermutations(s As String) As List
    Dim res As List
    res.Initialize
    If s.Length = 1 Then
        res.Add(s)
        Return res
    End If
    For i = 0 To s.Length - 1
        Dim ch As String = s.CharAt(i)
        Dim rem As String = s.SubString2(0, i) & s.SubString(i + 1)
        Dim perms As List = GeneratePermutations(rem)
        For Each perm As String In perms
            res.Add(ch & perm)
        Next
    Next
    Return res
End Sub

Sub QueryPermutations(db As SQL, input As String) As List
    ' Create temporary table
    db.ExecNonQuery("CREATE TEMP TABLE IF NOT EXISTS temp_permutations (word TEXT);")
    db.ExecNonQuery("DELETE FROM temp_permutations;")
  
    ' Generate permutations
    Dim permutations As List = GeneratePermutations(input)
    For Each perm As String In permutations
        db.ExecNonQuery2("INSERT INTO temp_permutations (word) VALUES (?)", Array As Object(perm))
    Next
  
    ' Query matching words
    Dim result As List
    result.Initialize
    Dim rs As ResultSet = db.ExecQuery("SELECT lang.word FROM language_table lang JOIN temp_permutations temp ON lang.word = temp.word;")
    Do While rs.NextRow
        result.Add(rs.GetString("word"))
    Loop
    rs.Close
  
    Return result
End Sub
Use this function in your B4A or B4i application to find all matching words for any given input string from your language database.
 
Last edited:

RichardN

Well-Known Member
Licensed User
Longtime User
Thanks @ ChatGPT!

I wonder if it works? I will try it out with B4J first I think
 

Sandman

Expert
Licensed User
Longtime User
Any experiences to share?
I'm going to be aggressive and say: Sure, it's possible! Not that I know how to do it, obviously. It's just that it seems SQL can do anything.

Evidence:
 

LucaMs

Expert
Licensed User
Longtime User
A friend and I resolved the issue (although in this case he demonstrates less memory than mine and I think he wants all the credit ?).

I think he doesn't want to divulge the "trick", but he probably does share the library (probably not for free).

I'll point this thread out to him.
 

hatzisn

Expert
Licensed User
Longtime User
Here is a brainteaser for any B4X insommiacs out there.....

I have a language database in SQLite with around 30k words. Given a string comprising 4 to 12 characters, is it possible to devise an SQL query that extracts all matching words for every possible permutation of those exact characters.

I started writing a query in SQLlite Expert Pro but it appears to be an awful lot more complicated than you might imagine.

Any experiences to share?

Exact match or partly?
 

RichardN

Well-Known Member
Licensed User
Longtime User
@LordZenzo Congratulazioni per aver trovato una soluzione. È qualcosa che desideri condividere o ti stai semplicemente vantando?

Congratulations on finding a solution. Is this something you wish to share or are you just bragging?
 
Last edited:

LordZenzo

Well-Known Member
Licensed User
Longtime User
@LordZenzo Congratulazioni per aver trovato una soluzione. È qualcosa che desideri condividere o ti stai semplicemente vantando?

Congratulations on finding a solution. Is this something you wish to share or are you just bragging?
Lo condividerò sono solo oberato dal lavoro ma prometto che in pochi giorni metterò qui la libreria che ho creato proprio per questo ed altri problemi legati alle parole
 

LordZenzo

Well-Known Member
Licensed User
Longtime User
this is my library updated to version 1.8


donations are welcome
large donations are more welcome
 

RichardN

Well-Known Member
Licensed User
Longtime User
Thanks for the inputs. I have tested different approaches with B4J. Bearing in mind that the GeneratePermutations Sub is recursive, that bit actually runs very fast.
Generate Permutations:
Sub GeneratePermutations(s As String) As List
    
    Dim Possibilities As List
    Possibilities.Initialize
    
    If s.Length = 1 Then
        
        Possibilities.Add(s)
        Return Possibilities
        
    End If
    
    For i = 0 To s.Length - 1
        Dim ch As String = s.CharAt(i)
        Dim rem As String = s.SubString2(0, i) & s.SubString(i + 1)
        Dim perms As List = GeneratePermutations(rem)
        
        For Each perm As String In perms
            Possibilities.Add(ch & perm)
        Next
    Next
    
    Return Possibilities
    
End Sub

The problem arises when you test each possibility against an SQL dictionary for being a valid word. If it is valid the it is added to a list of the final result(s). Testing each one quickly becomes very processor intensive. For a 5 character word there are 5! permutations = 120 words, a 9 character word has 362,880 possible permutations. Using SQL the simple way is slow.....
Test words for dictionary presence:
Sub Button1_Click
    
    Dim Input As String = Dialog.TextInputDialog("Anagrams","Enter Letter Jumble","","")
    If Input.Length = 0 Then Return
    
    Dim Permutations, Results As List
    Permutations.Initialize
    Results.Initialize
    
    Permutations = GeneratePermutations(Input)
    MainForm.Title = Input & " " & Permutations.Size
    
    For Each Word As String In Permutations
        
        Dim found As String
        found = SQL.ExecQuerySingleResult2("SELECT Word FROM English WHERE Word = ? " ,Array As String(Word))
        
        If found <> Null Then
            Results.Add(found)
        End If
        
    Next

End Sub

Normally one would consider Transaction to speed up queries but here we are only reading data, not writing it. Even with a word of 8 characters my 20 core PC takes 100 seconds or so to produce the results in release, I don't see a handheld device doing it any quicker. The alternatives might be:

1. Creating a coded list in memory of dictionary words of matching input length. That may be the faster solution, but big memory overhead.

2. Alternatively creating a temporary SQL Table or View in memory (does B4X support that?) of matching input length and operating on that. I'm not sure that querying a table of 30,000 words is much faster than querying a table of 5000 ???

Any other suggestions?
 

LordZenzo

Well-Known Member
Licensed User
Longtime User
Altri suggerimenti?
usa la mia libreria, la puoi scaricare dal link e puoi fare una donazione, oltre a gli anagrammi ha altre funzioni interessanti come la distanza di levensthain e il suggerrimento di correzzioni, devi solo avere un csv con il dizionario, io ne propongo per 3 lingue, ma puoi creare il tuo personale al primo utilizzo impiega qualche secondo (o piu) per caricare le parole e generare una tabella sqlite, ma poi è velocissimo

use my library, you can download it from the link and you can make a donation, in addition to anagrams it has other interesting functions such as levensthain distance and suggestion of corrections, you just need a csv with the dictionary, I offer it for 3 languages, but you can create your own at first use it takes a few seconds (or more) to load the words and generate a sqlite table, but then it is very fast
 

Peter Meares

Member
Licensed User
Longtime User
I wrote an anagram program many years ago and use the following method. I did use lists but a rewrite would move to SQL.
If one "Normalizes" each word into a string with each letter and a count as shown below, one can easily find anagrams by searching for the identical normailzed strings.
[A1C1E1L1N1R1U1]=NUCLEAR would also give UNCLEAR, etc.

Easy search in SQL with no extra non-words. A 300,000 word database I have is basically instantaneous.
 

LucaMs

Expert
Licensed User
Longtime User
There is a trick to obtain anagrams very quickly, instantly.
I can't reveal it;

I wrote an anagram program many years ago and use the following method. I did use lists but a rewrite would move to SQL.
If one "Normalizes" each word into a string with each letter and a count as shown below, one can easily find anagrams by searching for the identical normailzed strings.
[A1C1E1L1N1R1U1]=NUCLEAR would also give UNCLEAR, etc.

Easy search in SQL with no extra non-words. A 300,000 word database I have is basically instantaneous.

Since you're almost there with the "solution", I'll reveal the trick, simple and powerful.

You will have a table with two fields. The first field is the word, the second will contain a string made up of the same letters as the first, in alphabetical order.

For example:

[WORK] - [KEY]
more - emor
nuclear - eaclnru

When you have N letters to find the anagrams of, just sort its letters and look for the string thus composed in the [KEY] field (any name of the field, "key" is not a good choice).

Something like this:
B4X:
    Dim str As String = SortLetters(Letters)
    
    Query = $"SELECT Word FROM Words WHERE KeyField = ?"$

    Dim RS As ResultSet
    RS = DB.ExecQuery2(Query, Array As String(str))


B4X:
Private Sub SortLetters(Letters As String) As String
    Dim Result As String

    Dim lstLetters As List
    lstLetters.Initialize
    
    Dim bytChars() As Byte = Letters.GetBytes("UTF8")
    For i = 0 To bytChars.Length - 1
        lstLetters.Add(bytChars(i))
    Next
    lstLetters.Sort(True)
    
    For i = 0 To lstLetters.Size - 1
        Result = Result & Chr(lstLetters.Get(i))
    Next

    Return Result
End Sub
 
Last edited:

Daestrum

Expert
Licensed User
Longtime User
Further to @LucaMs example, by adding a word length column you can use a where clause to ignore words of the wrong length. Also index the length and the sorted letters column to gain a faster response.
 
Top