WMSpellChecker library

Discussion in 'Additional Libraries' started by moster67, Mar 15, 2009.

  1. moster67

    moster67 Expert Licensed User

    Last year I started a thread here on the forum in which I published some early versions of a spellchecker. At a certain point, I saw it more fit to make a library of the code I had written which I now post here.

    Here is some information about the spellchecker-library:

    Overview - WMSpellChecker

    Basically, a spell checker customarily consists of two parts:

    1) A set of routines for scanning text and extracting words, and
    2) An algorithm for comparing the extracted words against a known list of correctly spelled words (i.e., the dictionary).

    However, what mentioned above is only a "half" spell checker since these days spell checkers also suggest replacements/corrections for misspelled words (among other things such as synonyms and grammar-hints). Said suggestions can be proposed by the program based upon various techniques:

    - phonetic algorithms such as "Soundex" among others.
    - word lists containing common misspelled words and letters commonly inverted
    - functions called "Near Miss Strategy" and introduced by one of the first spell-checkers on the market, namely Ispell for UNIX and with its roots dating back to 1971.
    - algorithms like "edit distance" which measures the amount of difference between two sequences. A famous one is the "Levenshtein distance".
    - and other techniques

    I am aware of the fact that (at least) WM6 already offers spelling-suggestions and a spellchecker if Word (Office) has been installed but still I liked this idea so I decided to make a library. In any case, as far as I know, only the dictionary corresponding to the language of WM6 is being installed so if you want to spell check words in other languages you cannot do so.

    The way it works.....

    First of all, apart from referencing the library itself, you need to add two objects to your application, namely Dictionary and ComputeDetection.
    Then you need to load the dictionary-files by using "LoadDict". Currently they consist of four separate files. However, I may change this in a future release. The dictionary-files must be located in the application-directory although you can create sub-folders. This first release only supports English and the dictionaries distributed with the library must not be tempered with. Next release will bring support for other languages and will also include a separate program for handling dictionaries.

    Once the dictionaries have been loaded, you can start the spellchecking by calling the library using "ComputeDetection" which passes on your textbox-control to the library for parsing. You may tell the library to ignore words depending on their length by setting the property "SetMinimumWordLength" before executing "ComputeDetection". In case there are words that are not present in the dictionary, then a set of suggestions will be returned to the calling application and at the same time the word which was not found will be shown in the textbox in capital-letters. The suggestions produced by the library can be obtained using "ReturnSuggestions" which returns a string-array.

    Once you have shown the suggestions returned by the library, you can let your user in your application decide what to do i.e. ignore the wrong word ("IgnoreWord"), add an own word ("AddWord") to replace the wrong word or to replace the wrong word with a word from the suggestions ("ReplaceWord").
    At this point, you tell the library to continue spellchecking by using "ContinueDetection". You should also verify if spellchecking has been terminated by using "IsSpellingFinished".

    At any time, you can interrupt spellchecking by using "UnloadDict". This will be useful in a future release of the library so you can unload an English dictionary and to replace it with, for instance, a French dictionary without exiting your own application. However, before unloading the dictionary, you should verify if a dictionary has already been loaded or not by using "IsDictionaryLoaded".

    In the help-file, you can find more (important) information as to the commands available. Please also check out the two enclosed sample-programs (one using a classic spellchecking-interface and another one using context-menus) where the source-code has been commented.

    Other comments....

    This first release has some limitations, such as support only for English and the need for a textbox-control. However, I will add other features in the future, for instance:

    -support for other languages
    -dictionary-tools (for creating dictionaries) - will be an external program
    -possibility to add a user-dictionary
    -possibility to limit amount of suggestions produced by the library (by using a "ranking-system")
    -no further need for a textbox-control in your application. Your application will be able to pass on to the library only the word(s) you wish to spellcheck and the library will only return the suggestion(s) in a string-array. In this way, the spellchecker-library will not "interfere" with your application and you can use whichever control you prefer such as WebBrowser.
    -spellchecking "on the fly"
    -extended error-handling

    A few notes regarding dictionaries....

    The English dictionary supplied with the library is composed of nearly 70'000 words. Dictionaries to be used with the library must be sorted and each word in the dictionary must use LF = chr(10) as line-endings. In addition, the dictionary should be saved as UTF-8.

    From the dictionary, a KeyMap is created using either a Soundex - or a DoubleMetaphone-algorithm. In this moment, the KeyMap is being furnished with the library and loaded as an external file but future releases might create it on the fly (or at least an option to do so). With next release, I will add a utility, to be run from the Desktop, which will let you create your own dictionary and corresponding KeyMap which are compatible with WMSpellChecker.

    Unlike English and Scandinavian ones, dictionaries for German and Latin languages such as Spanish, Italian and French will probably be rather large. This is due to the fact that German, Italian and other similar languages use a lot of suffixes for instance when creating verbs. In order to overcome this, certain spellcheckers such as ASpell, ISpell, HunSpell (used by OpenOffice) have implemented dictionaries which mostly contain only the base-form of words/verbs. However, they use an additional file called "affix" which contains a lot of grammar-rules and this file together with the simplified dictionary overcomes the problem of large dictionaries. However, I believe this system is probably rather memory - and performance-hungry and might not be the best solution for Windows Mobile and PPC. However, maybe in the future I will look into this.

    Another negative side-effect of using a too large dictionary is that said dictionary may include more obscure words which will increase the risk that the spelling-engine will "miss" real-word errors. For instance, the word wether illustrates this. The word is, arguably, so obscure that any occurrence of wether in a passage is more likely to be a misspelling of weather or whether than a genuine occurrence of wether, so that a spellchecker that did not have the word in its dictionary would do better than one that did.

    Conclusion....

    The library can be used with projects developed with Basic4ppc (PPC and Desktop) but should also work with projects created in Visual Studio and SharpDevelop (using VB.NET and C#). The library has been compiled targeting Framework Version 2.0.

    Library-version: 1.01
    Helpfile-version: 1.01

    Change log:
    20/03/2009 - added the property "SetMinimumWordLength" as per request

    PS: I am attaching four zip-files as follows:

    1) WMSpellCheckerLibrary.zip (library and helpfile)
    2) DictionaryFiles.zip (in English and needed for the below examples)
    3) TextEditor.zip (this is Erel's texteditor to which I added spellchecking and context-menus)
    4) SpellingForm.zip (classic interface of a spellchecker)
    Note: Use above dictionary-files with the sample-applications. Please locate the files in the application-directory.

    Please check and test the library and let me know if it works.

    Feedback would be highly appreciated.

    Regards,

    \moster67
    Italy - March 15, 2009
     

    Attached Files:

    Last edited: Mar 20, 2009
  2. tsteward

    tsteward Active Member Licensed User

    Unfortunately the help file is empty :(
     
  3. moster67

    moster67 Expert Licensed User

    Are you sure? I just downloaded the library-file again and opened the help-file without any problems (stand-alone and from Basic4ppc).

    Try to download the file again - maybe something went wrong downloading it.

    rgds,
    moster67

     
  4. tsteward

    tsteward Active Member Licensed User

    Dunno whats going on then.
    I just downloaded [​IMG] WMSpellCheckerLibrary.zip again and still the help file has no contents for me.

    Not sure whats wrong. I'm using vista, file opens but no contents.
     
  5. moster67

    moster67 Expert Licensed User

    Weird! I am using XP and I have no problems.

    I recall that Chm-files could not be opened in Vista when Vista was first released - then MS released a patch which would let you open Chm-files even under Vista. See if you can find it - it should be under MS-downloads.

    Are you able to open other helpfiles from the menu Help in Basic4ppc?

    Maybe there's an option in the program I used for creating the help-file to render it compatible with Vista. I will verify that.

    Anyone else having problems opening the help-file?

    rgds,
    moster67


     
  6. tsteward

    tsteward Active Member Licensed User

    Yes I can open any chm file in B4PPC directory.
     
  7. tsteward

    tsteward Active Member Licensed User

    Solution

    :sign0060:
    For the education of Vista users. Right click on the chm file and select "Unblock" as per the attached pic.
    [​IMG]
     
  8. moster67

    moster67 Expert Licensed User

    I'm glad you sorted it out and thanks for posting the solution.:)

    rgds,
    moster67

     
  9. tsteward

    tsteward Active Member Licensed User

    Is it possible, instead or as well as changing word to upper case we can select the word?
    As in:
    textbox1.SelectionStart=2
    textbox1.SelectionLength=strlength(word)

    This would make it easier to see the word being suggested is wrong.

    Also
    Is it possible to only check words > 3 characters?
     
  10. moster67

    moster67 Expert Licensed User

    Hi,

    It's a good idea - the word becomes highlighted and it's easier distinguished.

    However, I already considered it when writing the library but I decided not to do it that way because if one by accident touches a key on the keyboard, the selected word will be deleted and replaced by the letter pressed on the keyboard. However, if the textbox has been set as disabled, then of course there is no harm..

    However, maybe I can add it as an option. I will think about it and if feasible, I will include it in the next release.

    In this moment, it's not supported although it shouldn't be that difficult to add an option to do so. I will put this suggestion in the "To Do"-list for next release.

    Thanks for your feedback.

    Rgds,
    moster67
     
    Last edited: Mar 17, 2009
  11. moster67

    moster67 Expert Licensed User

    Minor update to the library and helpfile.

    Added the property "SetMinimumWordLength" as suggested by tsteward. See helpfile for further information. Please let me know if it works as supposed.

    rgds,
    moster67

     
  12. tsteward

    tsteward Active Member Licensed User

    Is it possible to get the word currently selected as incorrect?
     
  13. tsteward

    tsteward Active Member Licensed User

    Do you have an example using SetMinimumWordLength
    I cant get it to work so I must be doing something wrong.

    I have a line
    SetMinimumWordLength(3)
    Which fails
     
  14. moster67

    moster67 Expert Licensed User

    You must assign the value as follows:

    [YourInstance].SetMinimumWordLength=4

    With this setting, the spellchecking-engine will only verify words that is composed of at least 4 characters or more. For instance: The word ths will be ignored but caar will be verified.

    I would suggest that you set this value after loading the dictionary-files.

    Make sure that you are using also the latest library in the file WMSpellCheckerLibrary.zip. The downloads with the sample-applications do NOT include the latest version of the library.

    Please let me know if this sorted out your problem.

    In any case, I will soon be posting a further update.


     
  15. moster67

    moster67 Expert Licensed User

    Please remember that it is the spellchecking-engine that parses your text and indicates which words that are not correct(not present in the dictionary).

    However, the new version of the library, which I will post shortly, gives you the possibility to pass on to the library a string of your choice that will be verified by the library and return suggestions in case the word is not correct. In this case, your application is responsible for parsing and selecting word(s) to verify.

     
    Last edited: Apr 10, 2009
  16. tsteward

    tsteward Active Member Licensed User

    This is an excellent library.
    My only complaint is it shows the word in the textbox being suggested as incorrect in upper case. When the text is already in uppercase I can not see what word is being suggested is wrong.

    Other than that I love it.

    Yes my bad your previous post solved my problem with min word length.

    Thanks
     
  17. moster67

    moster67 Expert Licensed User

    You are right about the uppercase-issue. Unfortunately the textbox-control won't let us apply bold, underscore ecc against single characters and the only thing I could come up with was to put the uncorrect word in uppercase. However, I will keep this is mind and try to implement the "selection-method" as previously suggested by you.

    Apart from that, I am glad that you liked it.

    PS: what about speed? Does it run fast enough on your device?

    Rgds,
    moster67

     
    Last edited: Apr 12, 2009
  18. tsteward

    tsteward Active Member Licensed User

    So far I haven't tested it on the device.

    The reason I was asking If I could get the word being suggested as incorrect was so I could put that into a text box & show it next to the list of suggested corrections.

    It would also be handy to simply edit that word if its not found in the dictionary.
     
  19. moster67

    moster67 Expert Licensed User

    Maybe I didn't understand what you meant...

    Do you want the library to pass back to the calling application a string containing the word which the library considered "not correct"? If this is what you meant, then I guess I could implement this quite easily.

    Please let me know.

    rgds,
    moster67

     
  20. tsteward

    tsteward Active Member Licensed User

    Yes if text box contains "The quuik brown fox jummped over"

    At the moment the first problem that will be found is "quuik" and this work is shown in uppercase.

    That's all fine but if I could also get the word that is currently converted to uppercase it could then be place into a text box where you might type your own correction etc. Thus showing what word we are on. Obviously highlighting would be better but even if you never figure out highlighting this would be a good feature. As said it could the be easily edited to add to the dictionary.
     
Loading...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice