Android Question Image processing for OCR.

Eme Fibonacci

Well-Known Member
Licensed User
Longtime User
Do you have a example code to do an image processing for OCR?

I already try many modes but my results dont was good.

1) Fix illumination of image (e.g. no dark part of image)

2) Binarize and de-noise


Thank you very much.
 

Johan Schoeman

Expert
Licensed User
Longtime User
Upvote 0

Johan Schoeman

Expert
Licensed User
Longtime User
Upvote 0

JordiCP

Expert
Licensed User
Longtime User
A 'normal' binarization will only work some times. Otsu is better since it adapts its threshold to the image itself.
But many real-world pics have local light differences through the same pic, that makes it more difficult to binarize them. Adaptive thresholding can give better results in such cases.

You can play with OpenCV to preprocess the image as much as you want before OCR. I've adapted an example to include adaptive thresholding (image and a bit of explanation taken from HERE)

Screenshot_2017-11-07-18-44-56-294_com.appiotic.ocv4b4a.samples.imageManipulations.png

Play with Blur option (needed) together with one of 'adaptive mean' or 'adapt.gaussian' with the third example pic
 

Attachments

  • imageManipulations_adaptiveThreshold.zip
    307.9 KB · Views: 289
Upvote 0

Eme Fibonacci

Well-Known Member
Licensed User
Longtime User
I agree with Johan: during my experiments with Johan's (and Don Manfred's) libraries (that are based on Google mobile Vision), any attemps to enhance the bitmap actually made the OCR results worse. My experiments involved editing the bitmap with Paintshop Pro:
- sharpening, dithering a.o.
- playing with contrast and brightness.
- image inversion.
- reduction of the bitdepth (from 16M to 65k to 16k to 256 and to 16 bits).
- making a greyscale image from a colour image, and other tricks.

OCR is the best when the characters are as crisp as possible, using the best possible camera's (5 Mega pixels or more).
I don't see why blurring could give any benefits...

I have a feeling that Google Vision already uses a large number of pre-processing tricks before doing the OCR.

There is actually no need to do pre-processing because Google Vision performs extremely well.
Only when trying to read TV subtitles then pre-processing might have some effect, but I still have not found a suitable method, especially because TV subtitles on light backgrounds are the main problem

I've tested google vision exhaustively and did not get the results I need.
I've used Google Cloud Vision and that's simply amazing but it gets paid.

I'm currently working on my own OCR and so I need to do an image processing.

Yes I know. It is a long and hard way.
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
although there is no 1 answer/solution, sauvola works wonders for me. why not post a troublesome image?
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
it doesn't quite work that way, i'm afraid. b4x code would just facilitate calling some code from a library (which doesn't exactly exist) or via inline java (to build a specific use app by exposing a few relevant calls from some available jars). unless you have the resources of a google, a specific use app is the best a single developer can shoot for. it isn't very difficult to employ many of the image manipulation functions available. choosing the right one and working with it is the problem.

you should post a representative image from your project. some of the suggested methods may work for you. if your idea is to build a better google cloud vision, you may not want to quit your day job just yet. let's see what you're looking at.

and if you can figure out how to contact me directly (maybe you just click on my avatar, i don't know), i would be happy to discuss it in more detail and to post any examples here that might have been useful to you. there is no general solution. you have to narrow things down unless you have the capacity to step through literally hundreds of available image processing functions until you find something that decodes your image. implementing, eg, otsu binarization before calling tesseract is not difficult at all (not that doing the one and then the other is all that might be required). you just can't do it with b4a alone. the tools out there are a little cumbersome - not very. they have been referred to here in the forum over the last few years. but you need to be comfortable on the fringes of the b4x cocoon.
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
ok, so here is a small wrapper and a demo. the purpose of both is to show 1 possible way to process an image for ocr with tesseract. the purpose is not an exhaustive analysis of the literally thousands of different functions, properties and parameters you might need for a production model. there are some other serious stumbling blocks that warrant further attention but are beyond the scope of the demo. i apoligize in advance for the graphic "design".

the wrapper envelops the entirety of tesseract and leptonica. tesseract uses leptonica's internal structures, which is why they come as a package. you could use opencv instead of or in addition to leptonica. it's not perfectly seamless, but it's an option.

although all of tesseract/leptonica is wrapped, only a few things are exposed to the b4a programmer. again, it's a demo of a particular issue.

the demo should run right out of the box. copy the tinytess.jar and .xml files to your add'l libraries directory, plug your device into the ide, compile and deploy the demo. the demo comes with 2 demo images: one a .jpg, the other a .png. one is easy (for tesseract), the other is harder. it can be run on emulator, but it quickly becomes a hassle to add new images (plus you'd have to add a little code to the demo to copy your images from assets to somewhere else, not to mention repointing the flocation variable.)

from the menu, you select "files" and tap on the image you want to work with. wait for tesseract and see what you get. you are free to add more images. but i'll tell you that what's missing is a camera activity with cropping or some kind of cropping activity. the demo does not crop. this does not guarantee that tesseract will fail completely, but, in general, it's not the way things work.

the pre-processing that i do for my single purpose project consists of converting the image to grayscale, binarization, some "enhancement", deskewing and rotation. after that has finished the image is passed in memory to tesseract for text extraction. as a courtesy, i write a copy of the processed image to the device (it's called workingimage.png) in the same location as the images you keep for ocr. it is there for post mortem. it isn't visible from the demo.

on the main "screen" you'll see 3 labels: version I, version II and text is black. version I uses the sauvola method for binarization.
version II uses a background normalization followed by binarization. the results (both in terms of image manipulation and text extraction can be very similar or
strikingly different. i tend to prefer version II, but i just when i get ready to phase out sauvola, it saves the day.) text is black is the usual condition, but if the text you're looking at is white (on a black background), tap the label to change it. inverting an image is easy, knowing programatically when it applies is not so easy. tesseract prefers black text on white, but sometimes it will nail the white text just to show you who's boss. for the moment, i tap on the label, when appropriate.

all of the preprocessing functions use "default" or "standard" parameters. for your app you will almost certainly need to tweak them after lengthy experimentation. the functions i use are only a mere handful of the hundreds available (not counting the hundreds available through opencv).

on the tesseract side, you should know that it uses 3 "ocr engines" (usedto be 4. you can choose the one you want.), 13 page segmentation modes (used to be 11. again, you choose), 4 different ways of setting up how it divvies up the text if finds (you choose), and dozens of configuration settings. in addition it requires a "language" file, that is a file which tells it what the text is supposed to look like, some tricky juxtaposition of characters that might occur in that language, and - if desired - a dictionary (in that language) which it consults when it's in doubt. it rates its own performance (in terms of confidence), but even when it's very confident, it can be wrong. so don't count on that. the usefulness of the language file is dependent on the number of different fonts it has incorporated. italic, bold, tall, thin, serif, sans serif, mixed (a nightmare) all can cause to tesseract to choke.

what i have done is apply certain "defaults" which have worked for me and my single purpose application (reading some text). in some cases it is possible to pass an un-pre-processed image directly to tesseract (not through the demo). if you crop tight with your camera and hold it steady and make sure there is no skew or wrap or mottled background, it can do ok.

the only pre-processing tesseract does is to convert an image to 1bpp if this has not already been done. letting tesseract do your binarization for you might be enough in some cases. most people roll their own. apart from that conversion, the only thing tesseract does - beyond actual text extraction - is to hunt for areas within the image which might be construed as text. it will do this regardless of what you may have done with the image, but the easier you make it for tesseract to find these candidates, the faster you'll be in and out and, hopefully, with the text. the longer you're in, the more trouble you're asking for.

there is some setup required by tesseract. it's already done for you in the demo. i won't get into it here. if you start customizing the demo, you could easily cause
tesseract to fail to initialize.

for the demo, i added an english language file. i myself don't use the english file. in fact, i use a modified version of whichever language file i do use since i don't
use a dictionary. this has repercussions. one, the size of the language file goes from big (20-30MB) down to less big (3MB). two, tesseract doesn't set aside something it considers a miss for later processing with the dictionary. and three, the confidence factor is even more meaningless. so you understand, someone writing a license plate recognition app would tend not to use a dictionary. it just slows things down. and because files without dictionaries are so much smaller, it's easier to use more than 1 at the same time (which teseract allows). and since tesseract is trainable (not done in the demo), your own "language" file could easily be a well schooled font file - nothing more. but, again, these matters are all beyond the purpose of the demo.

you could add your own language file for the demo. a small change to the .jar and to the demo will do it. in any case, the modified english language file included with the demo, does not contain a dictionary. i forget whether you'll have a problem if you change the demo and turn the dictionary on where there is none. if you want to download the full english file from tesseract on github. by my guest. i'll tell you how to incorporate it (if you don't already know). then you could modify the demo
to turn the dictionary on.

i've been following the ocr thread here for some time. as far as i know, this is the only "bundle" that allows you to pre-process an image and extract (or attempt to extract) text. it's not a start, it's not an end. it's somewhere along the starting edge. my project (not the demo) stands up pretty well in a number cases to textfairy (which i hold as the gold standard). my project did better than google mobile vision in the tests that i ran (while that wrapper was still available), although mobile vision was blindingly faster. i've tried to use google's cloud vision (not so much to compare but just to see what it does). i never got it to respond. there is a web page where you can drag an image and, presumably, cloud vision does its thing. i tried several times but never got it to do anything beyond accept my images.
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
it's the quotation marks in the link. sorry try this should be the same as before minus the offending quotes.
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
not quite ready for prime time. putting a correction up in a couple minutes.
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
a corrected version has been uploaded. same link as above, i assume. i never use google drive, so all the icons and choices have thrown me.
seems to be this
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
a friendly fellow forum user suggested a change in the demo's listview to safely handle your dcim folder's being populated already with many hires images. that has been implemented, and the new .zip uploaded to the same google drive link as above. google seems to feel the need to refer to it as version 3. you may or may not see it as such. let's humor them.
just fyi: i wanted to use a folder for images that i thought everyone would have on her device, the dcim. it makes it easy to put test images in it, either directly from your camera or by dragging from the desktop. of course, as was pointed out to me, not only were you likely to have the folder, but it was probably already full of images. listview might choke trying to load so many. hence the change to loadsample instead of just load.
 
Upvote 0

Eme Fibonacci

Well-Known Member
Licensed User
Longtime User
I always get this error

main_lv_itemclick (java line: 563)
java.lang.NoSuchMethodError: No direct method <init>(J)V in class Lorg/bytedeco/javacpp/FloatPointer; or its super classes (declaration of 'org.bytedeco.javacpp.FloatPointer' appears in /data/app/com.georgieapps.wrapper-2/base.apk)
at com.georgieapps.tinytess.TessieWrapper.skewRot(TessieWrapper.java:284)
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
sorry. i don't get the error. i have just unarchived the .zip, compiled and deployed on my nexus 4, android 5.x. i have also just tried on a nexus 5, android 7.x. and i know someone else who has tried (after downloading) with success. while i'm trying to track it down, remove demo from the device, kill the project folder on your desktop, go back to google drive and start over (including re-copying the .jar and .xml files to the add'l libraries folder. just to make sure we're on the same page. which device are you running? do you know how much rom you have? devices using the arm chipset were targeted. when you downloaded, did you get a .zip archive? when i click on the .zip file in google drive, it shows a breakdown of what's in the archive (which is not at all what i wanted, but google does what google wants), plus it seems to store the elements that make up the .zip separately. did you have to download a number of files individually? i'll work through the night. who needs sleep?
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
to my (limited) knowledge, you shouldn't see "wrapper-2.apk" (as mentioned in the error you reported). it's "wrapper.apk". remove the demo folder from the desktop
(or wherever your ide is), remove the demo from the device, go back and download the archive. i have just done that, and there was no error on my test devices. text was even extracted from a file that wasn't included in the demo. i am sorry for the trouble. keep me updated, please
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
excellent. and post a representative image. it's possible that a small adjustment could make the difference. if you understand the origins of tesseract, you will see that it's designed to attack specific targets. that it might be used in a universal environment doesn't alter that. you have to help it to see what it likes to see.
 
Upvote 0
Top