New feature is coming: OpenNLP - Text analysis

Erel

B4X founder
Staff member
Licensed User
Longtime User
DAVUFrpjRa.gif


(there is an encoding issue at the end of the gif)

More information: https://opennlp.apache.org/
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
It can do all kinds of things that can be relevant when you want to programmatically analyze text: https://opennlp.apache.org/docs/1.9.3/manual/opennlp.html
Analyzing text is a very complicated task. Just the language detection demonstrated above, is a task that we can't solve without a framework such as OpenNLP.
 

Magma

Expert
Licensed User
Longtime User
Will "do" all the features of OpenNLP ?

tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, language detection and coreference resolution

Named Entity extraction will be useful too..
ofcourse all features can help for example sentence segmentation will be useful for education apps (learning english, greek etc)
 

alwaysbusy

Expert
Licensed User
Longtime User
If only B4X with such features existed 20 years ago... We had to write NLP from scratch for a CAT (Computer Assisted Translation) and MT (Machine Translation) tool. The Fuzzy Matching engine was one of the best and fastest available at that time. I remember writing the "split" on sentences and words alone took us weeks to get it right for all languages. But the engine could also align/match sentences from two texts in different languages with 98% accuracy. It used our NLP to find which sentence was the best translation in the other text. This was very helpful to build Translation Memories that could feed the Machine Translation, which then assisted the human translators using the CAT tool.

The tool could extract texts from about 70 different file formats (from software source code files, over word processor documents and PDFs to powerpoints) and then rebuild the translations back to its original format preserving all the layout.

Fun times...

Align/Match tool:
1628678413940.png


CAT tool:
1628679000983.png


Alwaysbusy
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
I wonder where we will see this functionality show up as used in B4X. Is this something you needed for B4X Pleroma, Erel?
No, though I might use it to extend that app at some point.

I wanted to push B4X boundaries and after testing several directions, decided on text analysis.
 

Hamied Abou Hulaikah

Well-Known Member
Licensed User
Longtime User
great addition after IoT, waiting........
My next project will depend on it: Medical Diagnosis using machine learning..
 

All

Member
Dear Erel!
We are just trying to wrap these libraries for text analysis. But we were unable to find an exact guide on how to connect a very large library.
Just look, can you tell us this or is it suitable for you for text analysis?
Thanks!
 

All

Member
Dear Erel!
Can your word processing functions work (customize them) to work with the Russian language? This is very important for us, since the Russian audience in the CIS countries is more than 300 million people, and a very rapidly developing sector of IT VTSZ and specialists.
Will Russian be supported?
thanks
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
Top