B4J Question [SOLVED] Tesseract API - a $120 opotunity

Daestrum

Expert
Licensed User
Longtime User
Try this it can read from an image file or a byte buffer.

jTesseract.bas is the source for the library.

pic1.png is what I used for testing it.
 

Attachments

  • jTesseract.zip
    3.1 KB · Views: 314
  • jTesseractTester.zip
    988 bytes · Views: 297
  • jTesseract.bas
    2.2 KB · Views: 334
  • pic1.png
    8.3 KB · Views: 435
Last edited:
Upvote 0

xulihang

Active Member
Licensed User
Longtime User
As this is a desktop app, we can also simply run tesseract's commandline program. I made a pdf to text tool this way.
 
Upvote 0

jroriz

Active Member
Licensed User
Longtime User
Try this it can read from an image file or a byte buffer.

jTesseract.bas is the source for the library.

pic1.png is what I used for testing it.

Error when i try to compile:

 
Upvote 0

Daestrum

Expert
Licensed User
Longtime User
Sorry these are the extra jar files tesseract requires ( I just placed them in c:/temp/tess4j to stop my extralibs getting too large.
B4X:
commons-beanutils-1.9.2.jar     
commons-collections-3.2.1.jar
commons-io-2.6.jar             
commons-logging-1.2.jar
fontbox-2.0.12.jar             
ghost4j-1.0.1.jar
itext-2.1.7.jar                 
jai-imageio-core-1.4.0.jar
jbig2-imageio-3.0.2.jar         
jboss-logging-3.1.4.GA.jar
jboss-vfs-3.2.14.Final.jar     
jcl-over-slf4j-1.7.25.jar
jna-5.1.0.jar                   
jul-to-slf4j-1.7.25.jar
lept4j-1.10.0.jar               
log4j-1.2.17.jar
log4j-over-slf4j-1.7.25.jar     
logback-classic-1.2.3.jar
logback-core-1.2.3.jar         
pdfbox-2.0.12.jar               
pdfbox-debugger-2.0.12.jar
pdfbox-tools-2.0.12.jar         
slf4j-api-1.7.25.jar
tess4j-4.3.1.jar               
xmlgraphics-commons-1.4.jar
 
Upvote 0

jroriz

Active Member
Licensed User
Longtime User
Could you please attach them all?
 
Upvote 0

jroriz

Active Member
Licensed User
Longtime User
Get them from here (then we don't use up Erels storage)
https://jar-download.com/artifact-search/tess4j
Now the error is gone.
But there is another one:

Code:
B4X:
    Dim t As jTesseract
    t.Initialize("C:\tess\tessdata")
    Dim c3po As AWTRobot
    Log(t.OcrFromBuffer(c3po.ScreenCaptureAsByteArray))

Raises the error:
 
Upvote 0

Daestrum

Expert
Licensed User
Longtime User
Did you download the traineddata file for the language you want to use ?

In the tess4j folder there should be 25 jar files and one directory(tessdata)
 
Last edited:
Upvote 0

jroriz

Active Member
Licensed User
Longtime User
Did you download the traineddata file for the language you want to use ?
Thats what i have done so far:
- downloade tess4j, and extracted to c:\tess4j
- edited the jTesseract.xml, an changed c:\temp\tess4j for c:\tess4j, where i put the others jars.

Im using english for language.

Thats my C:\Tess4J\tessdata folder:
 
Upvote 0

Daestrum

Expert
Licensed User
Longtime User
ok try this - it is using the jTesseract as a class module.

You will need to change the location of the jars in the main module.
 

Attachments

  • tess4jClassAsModule.zip
    2.2 KB · Views: 293
Upvote 0

Daestrum

Expert
Licensed User
Longtime User
I loaded your b4j app - changed the path to the jars in the main module and it worked fine.
 
Upvote 0

Daestrum

Expert
Licensed User
Longtime User
Note my tessdata folder only has eng.traineddata in it no other files at all.
 
Upvote 0

jroriz

Active Member
Licensed User
Longtime User
Note my tessdata folder only has eng.traineddata in it no other files at all.
Is there something special with file tess4j-3.4.8.jar wich is in the C:\Tess4J\dist folder. Shoud it be copied to any special place?
Did you "instaled" something? I simply downloaded and unziped the tess4j.
 
Upvote 0

Daestrum

Expert
Licensed User
Longtime User
On the link I posted in post #8 I just downloaded the file (it was called jar_files.zip), unpacked it into my c:/temp/tess4j folder
 
Upvote 0

jroriz

Active Member
Licensed User
Longtime User
On the link I posted in post #8 I just downloaded the file (it was called jar_files.zip), unpacked it into my c:/temp/tess4j folder
I've done it all again and now it worked.
But it's issuing a warning, which I think slows down the OCR process.
"Warning: Parameter not found: enable_new_segsearch"
 
Upvote 0

Daestrum

Expert
Licensed User
Longtime User
You really don't owe me anything.
I enjoy writing code not for monetary reward.
What you could do with it
A, buy a book on java this will help you.
B, have a meal with it.
C, use it to extend your b4x licence.
D, keep it.

The plus for me is I had never heard of Tesseract before today, so I learned something new too.
 
Last edited:
Upvote 0
Cookies are required to use this site. You must accept them to continue using the site. Learn more…