Android Question Dictionary application - where is a good start?

bsnqt

Active Member
Licensed User
Dear Erel and all,

I am starting a project of making a dictionary. I need your advice / recommendation on how I would need to start it with. Your sharing or any tip would be very appreciated. My application should work with around 380,000 definitions (words), I am using SQLite for managing the database. But I have few questions, that I am not quite sure whether I get a right solution or not.

1) When the dictionary is open (started) by user: Loading the word list (380,000 words) for user to search -> I am using SearchView. Is it a good solution with such a huge list? The biggest challenge for me is immediately right after the user opens it (less than 1000ms) the app should be responsive so that the user can key in his wanted word. I can use a splash screen but again that should not be also more than 1s (like many other dictionaries on Play Store can do). I was successful for 180,000 words list with Searchview (Erel) in 1000ms, but with a list of 380,000 words it does take me 7000 - 8000ms. I can use async method of loading the list, but even if I do so, the database loading is not also ready in the background when the user press enter key to look for definition.

2) When the user click on one word (definition page will be popup similar to the picture): is it a good solution to load the information to a LabelEx (I tried the custom WebviewEx but unsuccessfully)? See the red circles in the picture, how can we do those 3 clickable "speakerphone symbols" to be clickable with RegEx and equally aligned on the same line with other word like the pronounciation (spelling)? Please note that some spellings can be long enough to "push" (wrap) the symbols down to the next line.

I made a search in our forum but it does not give me a lot of information (https://www.b4x.com/android/forum/pages/results/?query=dictionary), Thank you for your help.
 

Attachments

  • dicitonary (example).jpg
    dicitonary (example).jpg
    169.8 KB · Views: 97
Last edited:

Erel

Administrator
Staff member
Licensed User
1) Building the index when the program starts will be too slow. You should instead build the index with B4J, serialize it with B4XSerializator and async load it when the program starts.
Example: https://www.b4x.com/android/forum/t...ith-searchview-b4xserializator.61872/#content

Also consider whether you need non-prefix matching. It will make the index much smaller without it.

2) I'm not familiar with LabelEx. Check BBCodeView.
 
Upvote 0

bsnqt

Active Member
Licensed User
Thank you so much for your prompt reply, really appreciate. I wonder if you have more than 48 hours a day ;)

1) Building the index when the program starts will be too slow. You should instead build the index with B4J, serialize it with B4XSerializator and async load it when the program starts.
Example: https://www.b4x.com/android/forum/t...ith-searchview-b4xserializator.61872/#content

I did spend already a whole week to learn and understand (your) guideline & demo project in that link, trying many different options. And finally I could make it work for 180,000 words with B4ASerializator (with 2 "built-in" separate prefixList and substringList). However, it still takes around 3-6s. to index (in the "background") and during that time user still cannot do anything (he can key in word in the edittext as it is not unresponsive anymore), because there is no item populating into the listview as the index is not ready yet. For a list of 380,000 words, thing is even worse with around 7-9 sec (with MIN_LIMIT = 1 and MAX_LIMIT only 2) as shown below.

B4X:
Public MIN_LIMIT = 1, MAX_LIMIT = 2 As Int

Only prefix matching: Yes you totally right, I may need to consider matching only the prefix, that will shorten a lot of time (almost 60 - 70%), and that is a very good advice, but then I cannot take benefit from your SearchView. The substring search is really cool feature (I see that other dictionaries cannot search "inside" a word like our case). Sacrificing this "strong point" is a big regret :)

I m not familiar with LabelEx. Check BBCodeView.

2) Sorry I did mean LabelExtras. And I will look at BBCodeView as per your advice.

Once again thanks!
 
Upvote 0

bsnqt

Active Member
Licensed User
The substring search is very and very powerful, by the way. It gives us the possibility of finding out the proverbs, idioms, words combination and so on that the single prefix cannot do (like other "ordinary" dictionaries). Please see example in the picture.

screen shot dict.jpg
 
Upvote 0

Mahares

Expert
Licensed User
The substring search is very and very powerful
Have you considered looking into Full Text Search (FTS) in SQLite. It is quite powerful and has an extremely fast search engine. Here is the SQLite home link:
https://www.sqlite.org/fts5.html
And here is a code snippet I wrote a few years ago about FTS that demonstrates some of the search queries and different tests. I have not used it since because I did not have the need, but worth taking a look also:
I would love to get hold of your database for your dictionary and give it a try.
 
Upvote 0

agraham

Expert
Licensed User
I have built a dictionary app containing about 210,000 definitions and examples from English WordNet
https://github.com/globalwordnet/english-wordnet
I have pre-processed their data files into 211,862 individual strings with word, definition and synonyms saved as lines in a single text file 24Mb in size.

'dictionary (n)| A reference book containing an alphabetical list of words with information about them|lexicon'

I do three types of search

Starts with - locate individual words or words with a common starting substring
Contains - full text search for a literal string
Regex - full text search using Regex.

I didn't use a database as I don't think it adds anything but overhead to what is just a text searching app. The lack of database overhead makes full text searching very fast and there is no problem running searches on the main thread. I keep it simple and pre-load the strings into a process globals List in Activity_Create and search the strings in the List as needed. It really is quite simple and extremely fast (less than one second) even though the entire List is searched when only a single definition is expected to be found

B4X:
Sub edtWord_EnterPressed
    Disable
    Sleep(0)
    Dim idx, count As Int = 0
    Dim Stopped As Boolean
    FoundList.Clear
    btnNext.Enabled = False
    Shown = 0
    Stopped = False
    lblFound.Text = ""
    If edtWord.Text = ""  Then
        Enable
        Return
    End If
    For Each word As String In WordList            
        If chkRegex.Checked Then
            Dim  pattern As String = edtWord.Text.Trim
            If Not(chkSearchAll.Checked) Then
                pattern = "^" & pattern
            End If            
            idx = -1    
            Matcher1 = Regex.Matcher2(pattern, Regex.CASE_INSENSITIVE, word)
            If Matcher1.Find Then
                idx = 0
            End If
        Else
            Dim wordlc As String = word.ToLowerCase
            idx = wordlc.IndexOf(edtWord.Text.ToLowerCase.Trim)    
        End If    
        If   idx = 0  Or (chkSearchAll.Checked And idx > 0) Then
            FoundList.Add(word)
            count = count + 1
            If count = MaxResults Then
                Stopped = True
                Exit
            End If        
        End If
    Next
    If Stopped Then
        Dim msg As String = "Search halted. More than " & MaxResults & " matches found"
        MsgboxAsync(msg, "Search Halted")
        Wait For Msgbox_Result (Result As Int)
    End If
    If FoundList.Size > 0 Then
        ShowWord ' show single entry found 
    End If
    If FoundList.Size > 1 Then
        EnableShow ' show choice of multiple entries found
    End If
    Enable
End Sub
 
Last edited:
Upvote 0

bsnqt

Active Member
Licensed User
Hi Mahares, your advice is really interesting. I surely will have a look into Full Text Search (FTS) in SQLite, which I have just a little knowledge. My word list is 389,000-line text file and as big as 6.52Mb. How can I send it to you? Thank you.
 
Upvote 0

bsnqt

Active Member
Licensed User
Hi agraham, it is really cool. I will dig into that option and will let you know if it works, or will ask further question if I fail. Btw, my data is also 2-column one (one is "word", another is "definition / word's explanation"). For the second column as we will display it only later so time is not a big issue (and we can use different methods to process with string). But my full text file is extremely big: (1) the one with only first column (word list) is 6.52Mb as I mentioned in my previous post (2) the full (both columns) is a 50.6Mb text file.

I will give a try with text file option for the words text file. I will look at your codes and absorb, then try. Thank you so much.
 
Upvote 0

bsnqt

Active Member
Licensed User
Dear All,

I have done a test project with the approach advised by agraham (using text file). Now loading of 380,000 word list takes only 600ms or less! 😃 I think it is successful so far (it is as per my expectation), though I also have to sacrifice some features (such as searching in within a word by character, etc). I take the idea from the code above by agraham and combined different other things: SearchView algorithm explained by Erel (but again I changed the searching "mechanism"), as well as the idea from "lazy" loading CustomListview (also from Erel)... and so on. Kindly have a look and give me your advice as I think my test project is still far from being workable. I see the loading of the listview is still lagging.

Download: Test project with text file


B4X:
Private WordList As List
Private FoundList As Map
Private searchWord As String
Type CardData (Content As String, highlightIndex As Int, wordPosition As Int)

Private Sub edtWord_TextChanged (Old As String, New As String)
    
    Sleep(0)
    If edtWord.Text = ""  Then
        CLV1.Clear
        Return
    End If
    
    Dim startTime As Long = DateTime.Now
    searchWord = New.ToLowerCase.trim
    FoundList.Clear
    CLV1.Clear
    
    For i = 0 To WordList.Size - 1
        
        Dim word As String = WordList.Get(i) ' <-- this is the word with its order in the wordlist
        Dim mArray() As String = Regex.Split(" ", word)
        Dim idx1 As Int = 0  ' <-- to store the index of highlighted string
        
        For j = 0 To mArray.Length-1
            Dim idx2 As Int = 0
            idx2 = mArray(j).ToLowerCase.CompareTo(searchWord)
            
            If idx2 = 0 Then        '<-- we found the string in mArray that matches with searchWord
                For k = 0 To j-1
                    idx1 = idx1 + mArray(k).Length ' <-- to find the length of words before this matched string
                Next
                idx1 = idx1 + j     '<--- for adding of space number before each word
                FoundList.Put(word, idx1) ' Add the word together with the index of the highlighted string (for CSBuilder later to populate the cardlist if necessary)
                
                ' ADD THE ITEM TO THE CUSTOM LISTVIEW
                Dim cd As CardData
                cd.Initialize
                cd.Content = word
                cd.highlightIndex = idx1
                cd.wordPosition = i
                Dim p As B4XView = xui.CreatePanel("")
                p.SetLayoutAnimated(0, 0, 0, CLV1.AsView.Width, 60dip)
                CLV1.Add(p, cd)
                
                ' EXIT THE LOOP
                Exit
            
            Else
                Continue
            End If
            
            idx1 = 0
        
        Next
        
    Next
    
    LogColor("Searching time = " & (DateTime.Now - startTime) & " Foundlist.Size  = " & FoundList.Size & " items", Colors.Blue)

End Sub
 
Upvote 0

bsnqt

Active Member
Licensed User
Note: The text file is included in the project (>380,000 words) in File.DirAssets.

Please teach me how to improve the CustomListview loading. Thanks a lot.
 
Upvote 0

bsnqt

Active Member
Licensed User
Thank you Erel I will try it together with advice about TFS from Mahares and will let you know. Appreciate your help.
 
Upvote 0

bsnqt

Active Member
Licensed User
You can put it in your Google drive or Dropbox and put a link here in the forum for everyone to download and check it out too . This way you get different opinions which the forum is not short of. Maybe Graham can also test using his text file technique.

Mahares
I have read your thread https://www.b4x.com/android/forum/t...large-database-and-display-on-an-xclv.133517/.
Will study the project but in the mean time I have ran it and see it is very fast and accurate. The executing time is very impressive! Thank you so much.
 
Upvote 0

Dey

Active Member
Licensed User
Dear All,

I have done a test project with the approach advised by agraham (using text file). Now loading of 380,000 word list takes only 600ms or less! 😃 I think it is successful so far (it is as per my expectation), though I also have to sacrifice some features (such as searching in within a word by character, etc). I take the idea from the code above by agraham and combined different other things: SearchView algorithm explained by Erel (but again I changed the searching "mechanism"), as well as the idea from "lazy" loading CustomListview (also from Erel)... and so on. Kindly have a look and give me your advice as I think my test project is still far from being workable. I see the loading of the listview is still lagging.

Download: Test project with text file


B4X:
Private WordList As List
Private FoundList As Map
Private searchWord As String
Type CardData (Content As String, highlightIndex As Int, wordPosition As Int)

Private Sub edtWord_TextChanged (Old As String, New As String)
   
    Sleep(0)
    If edtWord.Text = ""  Then
        CLV1.Clear
        Return
    End If
   
    Dim startTime As Long = DateTime.Now
    searchWord = New.ToLowerCase.trim
    FoundList.Clear
    CLV1.Clear
   
    For i = 0 To WordList.Size - 1
       
        Dim word As String = WordList.Get(i) ' <-- this is the word with its order in the wordlist
        Dim mArray() As String = Regex.Split(" ", word)
        Dim idx1 As Int = 0  ' <-- to store the index of highlighted string
       
        For j = 0 To mArray.Length-1
            Dim idx2 As Int = 0
            idx2 = mArray(j).ToLowerCase.CompareTo(searchWord)
           
            If idx2 = 0 Then        '<-- we found the string in mArray that matches with searchWord
                For k = 0 To j-1
                    idx1 = idx1 + mArray(k).Length ' <-- to find the length of words before this matched string
                Next
                idx1 = idx1 + j     '<--- for adding of space number before each word
                FoundList.Put(word, idx1) ' Add the word together with the index of the highlighted string (for CSBuilder later to populate the cardlist if necessary)
               
                ' ADD THE ITEM TO THE CUSTOM LISTVIEW
                Dim cd As CardData
                cd.Initialize
                cd.Content = word
                cd.highlightIndex = idx1
                cd.wordPosition = i
                Dim p As B4XView = xui.CreatePanel("")
                p.SetLayoutAnimated(0, 0, 0, CLV1.AsView.Width, 60dip)
                CLV1.Add(p, cd)
               
                ' EXIT THE LOOP
                Exit
           
            Else
                Continue
            End If
           
            idx1 = 0
       
        Next
       
    Next
   
    LogColor("Searching time = " & (DateTime.Now - startTime) & " Foundlist.Size  = " & FoundList.Size & " items", Colors.Blue)

End Sub

Hello
can't download example

The item does not exist or is no longer available
 
Upvote 0

bsnqt

Active Member
Licensed User
Two new options that might be useful for such cases:

2. Trie based search (prefix only and also requires the items list to be sorted): https://www.b4x.com/android/forum/threads/b4x-trie-based-search-dialog.134220/#content

Thanks Erel. If I would know this approach (idea) one month ago, my life would be much easier 😭. Although it allows only prefix search and a sorted list, but in my project, it is acceptable. Only thing I need to investigate is that for a dictionary (mixed of English + local language in UTF-8), whether the sorted list is good enough to implement the prefix search. If it is good enough then I will need to consider to re-write my project 😰
 
Last edited:
Upvote 0
Top