B4A Library OCR with Tesseract

The purpose of this exercise was to see if OCR via Tesseract performs better/worse than the Vision API.

I have created a Jar for the Tesseract API (com.googlecode.tesseract.android.TessBaseAPI) by making use of this Github project. It took "ndk-build" about 30 minutes to create the .so files for 4 x CPU architect structures (armeabi, armeabi-v7a, mips, x86). The .so files are included in the TessTwo.jar that I have created.

I then created a shortcut wrap for this Github project. It brings nothing back to the B4A project - it will report the OCR result in a EditText view of the "shortcut" wrapped project. So, see how successful it is in OCR vs the Vision API. In my humble opinion:
1. It makes a massive APK although it allows for support of different languages. I have only allowed for English (see /Files/tessdata/eng.traineddata of the B4A project)
2. It seems to be slower than the Vision API in performing OCR
3. Accuracy vs the Vision API seems to be worse when making use of the camera to capture an image with text.

I have not yet tried to pass a bitmap with text to the TessBaseAPI to see if it performs better/worse than the Vision API. That will be my next exercise - probably by making use of some inline Java code so that I don't have to create another wrapper when making use of Tesseract OCR.

You can download the complete B4A project and lib files from here (a folder containing the complete B4A project and all the Jars and Xml files). Copy the Jars and XML files to your additional library folder. Make sure that you clean the project prior to compiling it (in B4A go Tools --> Clean project.

Link to the folder (or else click on "here" in the paragraph above):
https://drive.google.com/open?id=0B4g9tud5lvjgLXFZLThVVjFNaWs
 
Last edited:

bluedude

Well-Known Member
Licensed User
Interesting test. I'm investigating how to get OCR recognition for our custom font type and it seems that is possible with Tesseract and a custom training file. Not sure if that will work with the Vision API. Any idea?
 

swissmade

Well-Known Member
Licensed User
Nice Job
Is this also possible for B4J??
 

Syd Wright

Well-Known Member
Licensed User
Good to see that you are still experimenting with OCR.
In the past days I have been working with Don Manfred's Mobile Vision library.
The results are amazing! After some headaches I can now even read multi-column magazines and newspapers.

Before I try this Tesseract library, what is your own impression when comparing it with Google Vision?
 

bluedude

Well-Known Member
Licensed User
Can Google Vision be trained for a specific custom character type? Currently our custom font (we use if for art) is not recognized.
 

Johan Schoeman

Expert
Licensed User

Johan Schoeman

Expert
Licensed User
Good to see that you are still experimenting with OCR.
In the past days I have been working with Don Manfred's Mobile Vision library.
The results are amazing! After some headaches I can now even read multi-column magazines and newspapers.

Before I try this Tesseract library, what is your own impression when comparing it with Google Vision?

Syd, find it slow and "bulky". Some images with text needs to be "up sized" else Tesseract won't extract the correct text from a Bitmap. Some kind of preprocessing of Bitmaps will probably assist in improving accuracy.

Although there are trained data files for a vast number of different languages the "overhead" that it brings to an app is significant. I downloaded the trained language files (all of them) and the download was in excess of 300MB.
 

Syd Wright

Well-Known Member
Licensed User
Sounds like a non-starter to me (as I already expected, based on my disappointments with Tesseract for Windows over many years now). Google Vision has far more potential. Probably better to ditch Tesseract and continue with Google Vision. Both you and Don Manfred were/are making good progress. Don has finished his version 1.51 today with excellent features. Subtitles scanning results are also not bad. This library does (almost) everything that one would need and is very fast. The only missing part is how to parse barcodes into product information (plus I have not figured out the face detection part yet). I have now also made a lot of progress with my speaking "Wat zie ik" app for (Dutch) visually impaired users. One of my challenges was to get all the text blocks of a scanned multi-column magazine in the right order, but I seem to have succeeded (using various sorting tricks).
 

bluedude

Well-Known Member
Licensed User
The stuff we do is not language specific but font/character specific. We want to train a system to understand the letter art we use. These are pretty unreadable characters that need to be recognized by an app to make them readable.
 

drgottjr

Well-Known Member
Licensed User
while fast (really fast) and accurate (under good conditions), google's computer vision api is google's and is available on a "limited trial" - whose terms may or may not affect us - which means once google has figured out how to monetize it (and has gathered enough information from people using it), it may disappear. this could happen in the wild where the app is out of your hands. and if you think you can change the timeout buried in the library and go merrily on your way, you might want to think again. this is, after all, the all-knowing google. you should believe they know who's using it. just sayin'.

oh, and there is a very high likelihood that the text recognition part of the api is tesseract (for some time now, tesseract is, to all intents and purposes, google's ocr engine.) why they would spend years continuing its development and then use some other system borders the incredible. in addition, the choice of the types of output (block, line, word) is identical to tesseract's. but who knows? it could have been "fake" development all these years.

here's the thing: tesseract - whether standalone or as part of google's computer vision api (or anyone else's) - is the least of your worries. without proper pre-
processing, tesseract can be very disappointing at times extracting the text accurately. i've done some testing with the libraries that have appeared recently, and i can say that there is no pre-processing. the bitmaps are being handed off to tesseract (or google's text recognizer, if you prefer) directly. since you are likely to try your best to take clear pictures of aligned printed text, this may not matter. it won't matter even you fail slightly. this is due, for the most part, to tesseract's ability these days to handle mis-aligned text. but it was pretty easy to cause the api to fail in very typical - but not optimal - situations where an app that pre-processes the images before invoking tesseract would succeed (eg, textfairy or - imho - my own work in progress).

i can't image that it would be a difficult matter to add, eg, jordicp's opencv wrapper to an app with donmanfred's wrapper to handle the pre-processing before calling the computer vision api's text recognizer. whether or not you know how to pre-process an image as well as is done in textfairy (a very high bar) is another matter. but there are a few minimum steps that can be taken with opencv to assist the text recognition after a little online research. there is also leptonica (which is pretty much bundled with tesseract), but is bypassed by google (as say this as i can see no, or very little, pre-processing of the original bitmap.
 

Syd Wright

Well-Known Member
Licensed User
Interesting reading material. I have noticed that Google Vision (fortunately) also works without an Internet connection.
This would imply that the sourcecode is somewhere stored on the device and bitmaps are not sent to Google servers for OCR processing.

It is unclear to me whether Google Vision might stop functioning in the future or could become a paid service.
The question is howmuch has been integrated in the Github project, which subsequently Don Manfred has made a wrapper for. Also what happens if Google issues an update: will Don have to make a new wrapper?

I hear what you say about Tesseract. However, I have never seen an application that performs well with Tesseract (nor in Windows, nor in Android). It is clunky, slow and over-complicated. Google Vision responds within less than a second whereas Tesseract takes 10 seconds to a minute. I can hardly believe that both have the same origins.

I have also thought about bitmap pre-processing, especially because I have noticed that Google Vision does not do such a good job on multi-column magazine and newspaper articles. Often Google grabs parts of the text that really belong in the next or previous column. Widening the white gaps between blocks / columns of text (of splitting the bitmap into multiple bitmaps, each with a block of text) might improve this.
Blanc areas between text are fairly easily detectable (also a good reference to rotate bitmaps to the correct orientation).
 
Last edited:

DonManfred

Expert
Licensed User
is howmuch has been integrated in the Github project
NOTHING. The Code is inside the GooglePlayServices inside the Maven-Repos and not open to anyone

Additionally this is the wrong thread for this question.
 
Last edited:
Top