B4A Library OCR with Tesseract

The purpose of this exercise was to see if OCR via Tesseract performs better/worse than the Vision API.

I have created a Jar for the Tesseract API (com.googlecode.tesseract.android.TessBaseAPI) by making use of this Github project. It took "ndk-build" about 30 minutes to create the .so files for 4 x CPU architect structures (armeabi, armeabi-v7a, mips, x86). The .so files are included in the TessTwo.jar that I have created.

I then created a shortcut wrap for this Github project. It brings nothing back to the B4A project - it will report the OCR result in a EditText view of the "shortcut" wrapped project. So, see how successful it is in OCR vs the Vision API. In my humble opinion:
1. It makes a massive APK although it allows for support of different languages. I have only allowed for English (see /Files/tessdata/eng.traineddata of the B4A project)
2. It seems to be slower than the Vision API in performing OCR
3. Accuracy vs the Vision API seems to be worse when making use of the camera to capture an image with text.

I have not yet tried to pass a bitmap with text to the TessBaseAPI to see if it performs better/worse than the Vision API. That will be my next exercise - probably by making use of some inline Java code so that I don't have to create another wrapper when making use of Tesseract OCR.

You can download the complete B4A project and lib files from here (a folder containing the complete B4A project and all the Jars and Xml files). Copy the Jars and XML files to your additional library folder. Make sure that you clean the project prior to compiling it (in B4A go Tools --> Clean project.

Link to the folder (or else click on "here" in the paragraph above):
https://drive.google.com/open?id=0B4g9tud5lvjgLXFZLThVVjFNaWs
 
Last edited:

bluedude

Well-Known Member
Licensed User
Longtime User
Interesting test. I'm investigating how to get OCR recognition for our custom font type and it seems that is possible with Tesseract and a custom training file. Not sure if that will work with the Vision API. Any idea?
 

Swissmade

Well-Known Member
Licensed User
Longtime User
Nice Job
Is this also possible for B4J??
 

Johan Schoeman

Expert
Licensed User
Longtime User
Nice Job
Is this also possible for B4J??
I guess it is possible - but I have not looked into the "pure" Java version of Tesseract. Maybe someone would like to do so.....
 

bluedude

Well-Known Member
Licensed User
Longtime User
Can Google Vision be trained for a specific custom character type? Currently our custom font (we use if for art) is not recognized.
 

Johan Schoeman

Expert
Licensed User
Longtime User

Johan Schoeman

Expert
Licensed User
Longtime User
Good to see that you are still experimenting with OCR.
In the past days I have been working with Don Manfred's Mobile Vision library.
The results are amazing! After some headaches I can now even read multi-column magazines and newspapers.

Before I try this Tesseract library, what is your own impression when comparing it with Google Vision?

Syd, find it slow and "bulky". Some images with text needs to be "up sized" else Tesseract won't extract the correct text from a Bitmap. Some kind of preprocessing of Bitmaps will probably assist in improving accuracy.

Although there are trained data files for a vast number of different languages the "overhead" that it brings to an app is significant. I downloaded the trained language files (all of them) and the download was in excess of 300MB.
 

bluedude

Well-Known Member
Licensed User
Longtime User
The stuff we do is not language specific but font/character specific. We want to train a system to understand the letter art we use. These are pretty unreadable characters that need to be recognized by an app to make them readable.
 

drgottjr

Expert
Licensed User
Longtime User
while fast (really fast) and accurate (under good conditions), google's computer vision api is google's and is available on a "limited trial" - whose terms may or may not affect us - which means once google has figured out how to monetize it (and has gathered enough information from people using it), it may disappear. this could happen in the wild where the app is out of your hands. and if you think you can change the timeout buried in the library and go merrily on your way, you might want to think again. this is, after all, the all-knowing google. you should believe they know who's using it. just sayin'.

oh, and there is a very high likelihood that the text recognition part of the api is tesseract (for some time now, tesseract is, to all intents and purposes, google's ocr engine.) why they would spend years continuing its development and then use some other system borders the incredible. in addition, the choice of the types of output (block, line, word) is identical to tesseract's. but who knows? it could have been "fake" development all these years.

here's the thing: tesseract - whether standalone or as part of google's computer vision api (or anyone else's) - is the least of your worries. without proper pre-
processing, tesseract can be very disappointing at times extracting the text accurately. i've done some testing with the libraries that have appeared recently, and i can say that there is no pre-processing. the bitmaps are being handed off to tesseract (or google's text recognizer, if you prefer) directly. since you are likely to try your best to take clear pictures of aligned printed text, this may not matter. it won't matter even you fail slightly. this is due, for the most part, to tesseract's ability these days to handle mis-aligned text. but it was pretty easy to cause the api to fail in very typical - but not optimal - situations where an app that pre-processes the images before invoking tesseract would succeed (eg, textfairy or - imho - my own work in progress).

i can't image that it would be a difficult matter to add, eg, jordicp's opencv wrapper to an app with donmanfred's wrapper to handle the pre-processing before calling the computer vision api's text recognizer. whether or not you know how to pre-process an image as well as is done in textfairy (a very high bar) is another matter. but there are a few minimum steps that can be taken with opencv to assist the text recognition after a little online research. there is also leptonica (which is pretty much bundled with tesseract), but is bypassed by google (as say this as i can see no, or very little, pre-processing of the original bitmap.
 

DonManfred

Expert
Licensed User
Longtime User
is howmuch has been integrated in the Github project
NOTHING. The Code is inside the GooglePlayServices inside the Maven-Repos and not open to anyone

Additionally this is the wrong thread for this question.
 
Last edited:
Top