ok, so
here is a small wrapper and a demo. the purpose of both is to show 1 possible way to process an image for ocr with tesseract. the purpose is not an exhaustive analysis of the literally thousands of different functions, properties and parameters you might need for a production model. there are some other serious stumbling blocks that warrant further attention but are beyond the scope of the demo. i apoligize in advance for the graphic "design".
the wrapper envelops the entirety of tesseract and leptonica. tesseract uses leptonica's internal structures, which is why they come as a package. you could use opencv instead of or in addition to leptonica. it's not perfectly seamless, but it's an option.
although all of tesseract/leptonica is wrapped, only a few things are exposed to the b4a programmer. again, it's a demo of a particular issue.
the demo should run right out of the box. copy the tinytess.jar and .xml files to your add'l libraries directory, plug your device into the ide, compile and deploy the demo. the demo comes with 2 demo images: one a .jpg, the other a .png. one is easy (for tesseract), the other is harder. it can be run on emulator, but it quickly becomes a hassle to add new images (plus you'd have to add a little code to the demo to copy your images from assets to somewhere else, not to mention repointing the flocation variable.)
from the menu, you select "files" and tap on the image you want to work with. wait for tesseract and see what you get. you are free to add more images. but i'll tell you that what's missing is a camera activity with cropping or some kind of cropping activity. the demo does not crop. this does not guarantee that tesseract will fail completely, but, in general, it's not the way things work.
the pre-processing that i do for my single purpose project consists of converting the image to grayscale, binarization, some "enhancement", deskewing and rotation. after that has finished the image is passed in memory to tesseract for text extraction. as a courtesy, i write a copy of the processed image to the device (it's called workingimage.png) in the same location as the images you keep for ocr. it is there for post mortem. it isn't visible from the demo.
on the main "screen" you'll see 3 labels: version I, version II and text is black. version I uses the sauvola method for binarization.
version II uses a background normalization followed by binarization. the results (both in terms of image manipulation and text extraction can be very similar or
strikingly different. i tend to prefer version II, but i just when i get ready to phase out sauvola, it saves the day.) text is black is the usual condition, but if the text you're looking at is white (on a black background), tap the label to change it. inverting an image is easy, knowing programatically when it applies is not so easy. tesseract prefers black text on white, but sometimes it will nail the white text just to show you who's boss. for the moment, i tap on the label, when appropriate.
all of the preprocessing functions use "default" or "standard" parameters. for your app you will almost certainly need to tweak them after lengthy experimentation. the functions i use are only a mere handful of the hundreds available (not counting the hundreds available through opencv).
on the tesseract side, you should know that it uses 3 "ocr engines" (usedto be 4. you can choose the one you want.), 13 page segmentation modes (used to be 11. again, you choose), 4 different ways of setting up how it divvies up the text if finds (you choose), and dozens of configuration settings. in addition it requires a "language" file, that is a file which tells it what the text is supposed to look like, some tricky juxtaposition of characters that might occur in that language, and - if desired - a dictionary (in that language) which it consults when it's in doubt. it rates its own performance (in terms of confidence), but even when it's very confident, it can be wrong. so don't count on that. the usefulness of the language file is dependent on the number of different fonts it has incorporated. italic, bold, tall, thin, serif, sans serif, mixed (a nightmare) all can cause to tesseract to choke.
what i have done is apply certain "defaults" which have worked for me and my single purpose application (reading some text). in some cases it is possible to pass an un-pre-processed image directly to tesseract (not through the demo). if you crop tight with your camera and hold it steady and make sure there is no skew or wrap or mottled background, it can do ok.
the only pre-processing tesseract does is to convert an image to 1bpp if this has not already been done. letting tesseract do your binarization for you might be enough in some cases. most people roll their own. apart from that conversion, the only thing tesseract does - beyond actual text extraction - is to hunt for areas within the image which might be construed as text. it will do this regardless of what you may have done with the image, but the easier you make it for tesseract to find these candidates, the faster you'll be in and out and, hopefully, with the text. the longer you're in, the more trouble you're asking for.
there is some setup required by tesseract. it's already done for you in the demo. i won't get into it here. if you start customizing the demo, you could easily cause
tesseract to fail to initialize.
for the demo, i added an english language file. i myself don't use the english file. in fact, i use a modified version of whichever language file i do use since i don't
use a dictionary. this has repercussions. one, the size of the language file goes from big (20-30MB) down to less big (3MB). two, tesseract doesn't set aside something it considers a miss for later processing with the dictionary. and three, the confidence factor is even more meaningless. so you understand, someone writing a license plate recognition app would tend not to use a dictionary. it just slows things down. and because files without dictionaries are so much smaller, it's easier to use more than 1 at the same time (which teseract allows). and since tesseract is trainable (not done in the demo), your own "language" file could easily be a well schooled font file - nothing more. but, again, these matters are all beyond the purpose of the demo.
you could add your own language file for the demo. a small change to the .jar and to the demo will do it. in any case, the modified english language file included with the demo, does not contain a dictionary. i forget whether you'll have a problem if you change the demo and turn the dictionary on where there is none. if you want to download the full english file from tesseract on github. by my guest. i'll tell you how to incorporate it (if you don't already know). then you could modify the demo
to turn the dictionary on.
i've been following the ocr thread here for some time. as far as i know, this is the only "bundle" that allows you to pre-process an image and extract (or attempt to extract) text. it's not a start, it's not an end. it's somewhere along the starting edge. my project (not the demo) stands up pretty well in a number cases to textfairy (which i hold as the gold standard). my project did better than google mobile vision in the tests that i ran (while that wrapper was still available), although mobile vision was blindingly faster. i've tried to use google's cloud vision (not so much to compare but just to see what it does). i never got it to respond. there is a web page where you can drag an image and, presumably, cloud vision does its thing. i tried several times but never got it to do anything beyond accept my images.