Android Example OCR OFFLINE - Tesseract

I'm working in a project that needs OCR offline. Made a small progress and decided to share here and get some feedback.
I've searched a little bit at this forum and google about it. Found options to use online OCR (NJDUDE's Lib or Erel's example). In the same project I also needs to manipulate some images and got DrewG Exemple about inline code to use JAVACV/OPENCV. This was the point to test Tesseract OCR in the same way (Inline code).
Downloaded the Lib at this link https://repo1.maven.org/maven2/org/bytedeco/javacpp-presets/1.0/javacpp-presets-1.0-bin.zip
More details about this can be found here.

Unzipped and copied the files I needed to my Additional Lib Folder.
The files I used: javacpp.jar, tesseract-android-arm.jar, leptonica-android-arm.jar, tesseract.jar, leptonica.jar

Coded Basic Example from bytedeco page. Made some changes to send the image as a file path to the image saved somewhere in the phone and got the "translation" text.

OBS: I needed to download tessdata files to my cell. Tried to add them to my app, but they were too big and I got some error deploying my app to cell (need to see this more carefully). The files to many languages can be found here or at google project page. I have download this one for my test example.

Here is the code I used. My test phone is a S4.

Hope it helps.
 

Attachments

  • tessTest.zip
    11.9 KB · Views: 1,783
Last edited:

Urishev

Member
Licensed User
Longtime User
Hello! I'm a doctor and a programmer newbie. I want to create an application.
Scanning and recognition of a standard blood test and computer conclusion.
The problem is in the recognition of text.
Where to start?
 

lemonisdead

Well-Known Member
Licensed User
Longtime User
The problem is in the recognition of text.
Hello,
Please what do you mean by that ? Have you tried with the examples provided in the first message of that thread ?
 

Urishev

Member
Licensed User
Longtime User
Thanks for the reply.
I downloaded "javacpp-presets-1,0-bin", but failed to install the library for the application "testTest".
How to install the library?
Log:"java.io.FileNotFoundException: /tesseract-ocr-3.02.eng.tar.gz (Read-only file system)"
Where to insert "tesseract-ocr-3.02.eng.tar.gz"?
 
Last edited:

lemonisdead

Well-Known Member
Licensed User
Longtime User
As I understand the first post, you should unzip the downloaded file and install the .jar files in your Additional Libraries folder
The files I used: javacpp.jar, tesseract-android-arm.jar, leptonica-android-arm.jar, tesseract.jar, leptonica.jar

The required .jar files are linked to the project from line 69 to 73
B4X:
#AdditionalJar: javacpp
#AdditionalJar: tesseract-android-arm
#AdditionalJar: leptonica-android-arm
#AdditionalJar: tesseract
#AdditionalJar: leptonica
Did you made it like that ? I will try to do it this way and report

Edit : it works great as expected. The sole error I've got was with an Intel CPU and a crash on install. In such cases you have to copy tesseract-android-x86 and leptonica-android-x86 in the additional libraries folder too
 
Last edited:

Urishev

Member
Licensed User
Longtime User
Did as you. Log:
** Activity (main) Create, isFirst = true **
Here - getText()
Before Init
RETCODE =-1
Could not initialize tesseract.
** Activity (main) Pause, UserClosed = false **

Log:"java.io.FileNotFoundException: /tesseract-ocr-3.02.eng.tar.gz (Read-only file system)"
Where to insert "tesseract-ocr-3.02.eng.tar.gz"?
 
Last edited:

DonManfred

Expert
Licensed User
Longtime User
Try to uncheck FILTERED to get the unfiltered log and see if the log outputs more info now
 

joilts

Member
Licensed User
Longtime User
Hello, Sorry to take so long to answers (was on vacation).
I´m going to put in here what I did to install the lib. I´m sorry if its too "rookie", but that what I´m in B4X.
1- Downloaded the Lib at this link https://repo1.maven.org/maven2/org/bytedeco/javacpp-presets/1.0/javacpp-presets-1.0-bin.zip
2-Unzipped Files to any folder. The files I used: javacpp.jar, tesseract-android-arm.jar, leptonica-android-arm.jar, tesseract.jar, leptonica.jar (but you may use different files with -x86 extension, as said by lemonisdead in above post).
3-Open B4A-> Tools -> Configure Paths
4-At Additional Libs edit field insert for example "C:\Program Files (x86)\Anywhere Software\Basic4android\AdditionalLibs" (The folder you used to unzip at step 2)
5-Compile program.

That´s all I did.

I Do not have the original project here and I´m not able to download the example from the post right now, but I have a modified project (to do ANPR) and I use these Libs:


B4X:
#AdditionalJar: opencv
#AdditionalJar: opencv-android-arm
#AdditionalJar: javacv
#AdditionalJar: javacpp
#AdditionalJar: tesseract-android-arm
#AdditionalJar: leptonica-android-arm
#AdditionalJar: tesseract
#AdditionalJar: leptonica


Opencv is used to work with images and I don´t think you need them. Anyway, here is a link to all additional Libs I have in my path right now.

https://drive.google.com/file/d/0B-i5U_B2M-ETaWxWNktYdi1FNmc/view?usp=sharing

H
ope it help.
 

Urishev

Member
Licensed User
Longtime User
Thank you for your attention to my problem. What did I do wrong? The log outputs more info:
onReceive
widget onReceive ->InfoAlarmWidget.action.widget.news.scroll
Starting: Intent { act=android.intent.action.MAIN flg=0x30000000 cmp=b4a.example/.main } from pid 2264
HistoryRecord{40b54838 b4a.example/.main} failed creating starting window
java.lang.RuntimeException: Binary XML file line #25: You must supply a layout_height attribute.
at android.content.res.TypedArray.getLayoutDimension(TypedArray.java:491)
at android.view.ViewGroup$LayoutParams.setBaseAttributes(ViewGroup.java:3599)
at android.view.ViewGroup$MarginLayoutParams.<init>(ViewGroup.java:3678)
at android.widget.LinearLayout$LayoutParams.<init>(LinearLayout.java:1400)
at android.widget.LinearLayout.generateLayoutParams(LinearLayout.java:1326)
at android.widget.LinearLayout.generateLayoutParams(LinearLayout.java:47)
at android.view.LayoutInflater.rInflate(LayoutInflater.java:625)
at android.view.LayoutInflater.inflate(LayoutInflater.java:408)
at android.view.LayoutInflater.inflate(LayoutInflater.java:320)
at android.view.LayoutInflater.inflate(LayoutInflater.java:276)
at com.android.internal.policy.impl.PhoneWindow.generateLayout(PhoneWindow.java:2400)
at com.android.internal.policy.impl.PhoneWindow.installDecor(PhoneWindow.java:2455)
at com.android.internal.policy.impl.PhoneWindow.getDecorView(PhoneWindow.java:1621)
at com.android.internal.policy.impl.PhoneWindowManager.addStartingWindow(PhoneWindowManager.java:1092)
at com.android.server.WindowManagerService$H.handleMessage(WindowManagerService.java:8182)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:130)
at com.android.server.WindowManagerService$WMThread.run(WindowManagerService.java:576)
** Activity (main) Pause, UserClosed = false **
Start proc b4a.example for activity b4a.example/.main: pid=2796 uid=10097 gids={1015, 3003}
setHidden false
Could not find method android.view.ViewGroup.addChildrenForAccessibility, referenced from method anywheresoftware.b4a.BALayout.addChildrenForAccessibility
VFY: unable to resolve virtual method 319: Landroid/view/ViewGroup;.addChildrenForAccessibility (Ljava/util/ArrayList;)V
VFY: replacing opcode 0x6f at 0x0009
VFY: dead code 0x000c-000c in Lanywheresoftware/b4a/BALayout;.addChildrenForAccessibility (Ljava/util/ArrayList;)V
setHidden false
Could not find method android.view.View.setPivotX, referenced from method anywheresoftware.b4a.objects.ViewWrapper.AnimateFrom
VFY: unable to resolve virtual method 313: Landroid/view/View;.setPivotX (F)V
VFY: replacing opcode 0x6e at 0x0025
VFY: dead code 0x0028-018e in Lanywheresoftware/b4a/objects/ViewWrapper;.AnimateFrom (Landroid/view/View;IIIII)V
VFY: dead code 0x0190-0192 in Lanywheresoftware/b4a/objects/ViewWrapper;.AnimateFrom (Landroid/view/View;IIIII)V
Could not find method android.animation.ValueAnimator.ofFloat, referenced from method anywheresoftware.b4a.objects.ViewWrapper.SetColorAnimated
VFY: unable to resolve static method 12: Landroid/animation/ValueAnimator;.ofFloat ([F)Landroid/animation/ValueAnimator;
VFY: replacing opcode 0x71 at 0x004c
VFY: dead code 0x004f-0089 in Lanywheresoftware/b4a/objects/ViewWrapper;.SetColorAnimated (III)V
Could not find method android.animation.ObjectAnimator.ofFloat, referenced from method anywheresoftware.b4a.objects.ViewWrapper.SetVisibleAnimated
VFY: unable to resolve static method 7: Landroid/animation/ObjectAnimator;.ofFloat (Ljava/lang/Object;Ljava/lang/String;[F)Landroid/animation/ObjectAnimator;
VFY: replacing opcode 0x71 at 0x0029
Could not find method android.animation.ObjectAnimator.ofFloat, referenced from method anywheresoftware.b4a.objects.ViewWrapper.SetVisibleAnimated
VFY: unable to resolve static method 7: Landroid/animation/ObjectAnimator;.ofFloat (Ljava/lang/Object;Ljava/lang/String;[F)Landroid/animation/ObjectAnimator;
VFY: replacing opcode 0x71 at 0x0059
VFY: dead code 0x002c-004e in Lanywheresoftware/b4a/objects/ViewWrapper;.SetVisibleAnimated (IZ)V
VFY: dead code 0x005c-005e in Lanywheresoftware/b4a/objects/ViewWrapper;.SetVisibleAnimated (IZ)V
GC_EXTERNAL_ALLOC freed 82K, 47% free 2963K/5575K, external 2462K/2652K, paused 20ms
setHidden false
Displayed b4a.example/.main: +449ms
setHidden false
** Activity (main) Create, isFirst = true **
setHidden false
Here - getText()
setHidden false
No JNI_OnLoad found in /system/lib/libc.so 0x40513ed0, skipping init
No JNI_OnLoad found in /system/lib/libm.so 0x40513ed0, skipping init
No JNI_OnLoad found in /system/lib/libz.so 0x40513ed0, skipping init
setHidden false
No JNI_OnLoad found in /system/lib/libdl.so 0x40513ed0, skipping init
No JNI_OnLoad found in /system/lib/liblog.so 0x40513ed0, skipping init
Trying to load lib /data/data/b4a.example/lib/liblept.so 0x40513ed0
Added shared lib /data/data/b4a.example/lib/liblept.so 0x40513ed0
No JNI_OnLoad found in /data/data/b4a.example/lib/liblept.so 0x40513ed0, skipping init
Trying to load lib /data/data/b4a.example/lib/libjnilept.so 0x40513ed0
setHidden false
Added shared lib /data/data/b4a.example/lib/libjnilept.so 0x40513ed0
setHidden false
Trying to load lib /data/data/b4a.example/lib/libtesseract.so 0x40513ed0
setHidden false
Added shared lib /data/data/b4a.example/lib/libtesseract.so 0x40513ed0
No JNI_OnLoad found in /data/data/b4a.example/lib/libtesseract.so 0x40513ed0, skipping init
setHidden false
Trying to load lib /data/data/b4a.example/lib/libjnitesseract.so 0x40513ed0
Added shared lib /data/data/b4a.example/lib/libjnitesseract.so 0x40513ed0
setHidden false
Before Init
setHidden false
RETCODE =-1
Could not initialize tesseract.
setHidden false
...
 

joilts

Member
Licensed User
Longtime User
The RETCODE =-1 seems to be motivated by missing dat file that holds tesseract data. They must be copied to your phone manually like, the obs at first post: "OBS: I needed to download tessdata files to my cell. Tried to add them to my app, but they were too big and I got some error deploying my app to cell (need to see this more carefully). The files to many languages can be found here or at google project page. I have download this one for my test example.".
The app code must be changed to use theses files where you decide to put them.
Unfortunately I'm away from my programming laptop and I can´t add the code to copy the file from app folder to destination folder.

UPDATE: Here is the code I used in another project to copy trained data from my app file dir to RootExternal... The file must be unzipped

B4X:
    'Create dir for Trainned Data
    If File.IsDirectory(File.DirRootExternal,"tessdata") = False Then
       File.MakeDir(File.DirRootExternal, "tessdata")
       tessDataPath = File.DirRootExternal & "/tessdata"
        'Copy Trainned Files to RootDir
        Dim fList As List = File.ListFiles(File.DirAssets)
        Dim fileName As String
        For i=0 To fList.Size-1
            fileName = fList.Get(i)
            If fileName.ToUpperCase.Contains("pla.traineddata") Then
                If File.Exists(tessDataPath,fileName)=False Then
                    File.Copy(File.DirAssets,fileName, tessDataPath, fileName)
                End If
            End If
        Next
    End If
 
Last edited:

MarcoRome

Expert
Licensed User
Longtime User
Hi all. Joilts Thank you for this example is very usefull
I try this example. work well in english. But if i add example italian. any time i have this message
"Could not initialize tesseract."

The code that i modified is this:
B4X:
public static String getTextIta(String path, String filename, String extension, String TrainFileDir) {
    BA.Log("" + "Here - getText() ");
    BytePointer outText;
    TessBaseAPI api = new TessBaseAPI();
    BA.Log("" + "Before Init ");
    int retCode = api.Init(TrainFileDir, "ita");
    BA.Log("RETCODE =" + retCode);
    if (retCode != 0) {
        return("Could not initialize tesseract.");
    }
  
    PIX image = pixRead(path+filename+extension);
    BA.Log("" + "File Open");
    api.SetImage(image);
    BA.Log("" + "Before get Text");
    outText = api.GetUTF8Text();
    api.End();
    outText.deallocate();
    pixDestroy(image);
    return(outText.getString());

}

at line
int retCode = api.Init(TrainFileDir, "ita");
( first was "eng" ).

I add also file ita.traineddata
The thing strain is that if i donwload this file about GITHUB i have a file 13.6Mb ( ita.traineddata ), if i download file GOOGLE i have file 2.3Mb.
Anyway i try both and i have anyway this message "Not initialize"

Of course if i change with eng.traineddata at change at line
int retCode = api.Init(TrainFileDir, "eng");
all work
Any idea ?
Thank you
Marco
 

joilts

Member
Licensed User
Longtime User
Hi Marco,

It is probably a bad (corrupted) file you are using or a missing file in tessdata directory. I just downloaded ita.tainneddata from google (it was a gz file -about 917 kb). Just unzipped (final size was around 2 mb) and it seems to work fine. Got a text png in Italian and made a test app. Here it is, with the data file attached. Also a screen shot of result. Have no idea about what is written in Italian, so forgive me if it's no good.

As the example project got large with the tess data file, I just upload to google drive. You can find it here.
The screen shoot is attached in this post.

Hope it helps.

See you.
 

Attachments

  • Screenshot_2016-03-18-12-23-39.png
    Screenshot_2016-03-18-12-23-39.png
    145.8 KB · Views: 921
Last edited:

MarcoRome

Expert
Licensed User
Longtime User
Hi Marco,

It is probably a bad (corrupted) file you are using or a missing file in tessdata directory. I just downloaded ita.tainneddata from google (it was a gz file -about 917 kb). Just unzipped (final size was around 2 mb) and it seems to work fine. Got a text png in Italian and made a test app. Here it is, with the data file attached. Also a screen shot of result. Have no idea about what is written in Italian, so forgive me if it's no good.

As the example project got large with the tess data file, I just upload to google drive. You can find it here.
The screen shoot is attached in this post.

Hope it helps.

See you.
Yes, right Corrupted file.
Thank you very much for your support
 

roberto64

Active Member
Licensed User
Longtime User
joilts prejudice, I'm trying to create with a gutshot ANPR for the recognition of license plates, but I have tried in vain for a libreia riconoscimeto dele plates in B4A, I read that you do is rializzando, you could give me a hand?
Greetings
 

joilts

Member
Licensed User
Longtime User
Hi Roberto. I´ve done some work on ANPR solution. It is done on a picture from the vehicle. To work I had to train OCR to recognize the font used in my country. You should do the same for your country. I have used jTessBoxEditor and Tesseract tools to train. There are many other tools for that (some online). But before process picture in OCR functions, I had to crop image to have only the plate (there are many examples on internet). Then I had some image transformations to get image sharpen and B&W. It depends on the color of plates in your country. In my country we have 5 types of plates and I had to design 4 different methods to transform image. With the prepared image, you can use OCR. Hope it helps. Some of the images transformations I´ve made can be found at this lib for B4i
 

roberto64

Active Member
Licensed User
Longtime User
joilts hello and thank you for your time, I have never used b4i the lib you can also use on B4A?
thank you
 

joilts

Member
Licensed User
Longtime User
No. This lib for b4i can not be used on B4a as uses inline code (Object-C). The idea in showing the b4i lib is just to help you to have an idea of which image transformations are needed (in my case). You must rewrite them to work at b4a. I did it using inline java code and opencv lib. But there are some b4a libs that has some of the transformations you may need.
 

roberto64

Active Member
Licensed User
Longtime User
hello joilts, or Picasso used to trasfomare the image and tesseract ocr for riconoscimeto of letters and numbers with no success, if you can help me with some examples to you already, but unfortunately not familiar java vb net.
greetings Roberto
 
Top