It can do all kinds of things that can be relevant when you want to programmatically analyze text: https://opennlp.apache.org/docs/1.9.3/manual/opennlp.html
Analyzing text is a very complicated task. Just the language detection demonstrated above, is a task that we can't solve without a framework such as OpenNLP.
tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, language detection and coreference resolution
Named Entity extraction will be useful too..
ofcourse all features can help for example sentence segmentation will be useful for education apps (learning english, greek etc)
If only B4X with such features existed 20 years ago... We had to write NLP from scratch for a CAT (Computer Assisted Translation) and MT (Machine Translation) tool. The Fuzzy Matching engine was one of the best and fastest available at that time. I remember writing the "split" on sentences and words alone took us weeks to get it right for all languages. But the engine could also align/match sentences from two texts in different languages with 98% accuracy. It used our NLP to find which sentence was the best translation in the other text. This was very helpful to build Translation Memories that could feed the Machine Translation, which then assisted the human translators using the CAT tool.
The tool could extract texts from about 70 different file formats (from software source code files, over word processor documents and PDFs to powerpoints) and then rebuild the translations back to its original format preserving all the layout.
Dear Erel!
We are just trying to wrap these libraries for text analysis. But we were unable to find an exact guide on how to connect a very large library.
Just look, can you tell us this or is it suitable for you for text analysis?
Thanks!
Dear Erel!
Can your word processing functions work (customize them) to work with the Russian language? This is very important for us, since the Russian audience in the CIS countries is more than 300 million people, and a very rapidly developing sector of IT VTSZ and specialists.
Will Russian be supported?
thanks
I'm wrapping OpenNLP so it is not really "my" word processor
It can handle any language however you need to have a "model" file that is relevant to your domain. You can also create one yourself. It is called training.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.