B4J Question [SOLVED] Tesseract API - a $120 opotunity

Discussion in 'B4J Questions' started by jroriz, Jan 14, 2019.

Tags:
  1. jroriz

    jroriz Active Member Licensed User

  2. Daestrum

    Daestrum Well-Known Member Licensed User

    Try this it can read from an image file or a byte buffer.

    jTesseract.bas is the source for the library.

    pic1.png is what I used for testing it.
     

    Attached Files:

    Last edited: Jan 15, 2019
    DonManfred, jroriz and inakigarm like this.
  3. xulihang

    xulihang Active Member Licensed User

    As this is a desktop app, we can also simply run tesseract's commandline program. I made a pdf to text tool this way.
     
    Daestrum likes this.
  4. jroriz

    jroriz Active Member Licensed User

    Error when i try to compile:

     
  5. jroriz

    jroriz Active Member Licensed User

    But i have to read a byte array, exactly what @Daestrum did.
    I think command line reads only image files.
     
  6. Daestrum

    Daestrum Well-Known Member Licensed User

    Sorry these are the extra jar files tesseract requires ( I just placed them in c:/temp/tess4j to stop my extralibs getting too large.
    Code:
    commons-beanutils-1.9.2.jar     
    commons-collections-
    3.2.1.jar
    commons-io-
    2.6.jar             
    commons-logging-
    1.2.jar
    fontbox-
    2.0.12.jar             
    ghost4j-
    1.0.1.jar
    itext-
    2.1.7.jar                 
    jai-imageio-core-
    1.4.0.jar
    jbig2-imageio-
    3.0.2.jar         
    jboss-logging-
    3.1.4.GA.jar
    jboss-vfs-
    3.2.14.Final.jar     
    jcl-over-slf4j-
    1.7.25.jar
    jna-
    5.1.0.jar                   
    jul-
    to-slf4j-1.7.25.jar
    lept4j-
    1.10.0.jar               
    log4j-
    1.2.17.jar
    log4j-over-slf4j-
    1.7.25.jar     
    logback-classic-
    1.2.3.jar
    logback-core-
    1.2.3.jar         
    pdfbox-
    2.0.12.jar               
    pdfbox-debugger-
    2.0.12.jar
    pdfbox-tools-
    2.0.12.jar         
    slf4j-api-
    1.7.25.jar
    tess4j-
    4.3.1.jar               
    xmlgraphics-commons-
    1.4.jar
     
  7. jroriz

    jroriz Active Member Licensed User

    Could you please attach them all?
     
  8. Daestrum

    Daestrum Well-Known Member Licensed User

    xulihang likes this.
  9. jroriz

    jroriz Active Member Licensed User

    Now the error is gone.
    But there is another one:

    Code:
    Code:
    Dim t As jTesseract
        t.Initialize(
    "C:\tess\tessdata")
        
    Dim c3po As AWTRobot
        
    Log(t.OcrFromBuffer(c3po.ScreenCaptureAsByteArray))
    Raises the error:
     
  10. Daestrum

    Daestrum Well-Known Member Licensed User

    Did you download the traineddata file for the language you want to use ?

    In the tess4j folder there should be 25 jar files and one directory(tessdata)
     
    Last edited: Jan 15, 2019
  11. jroriz

    jroriz Active Member Licensed User

    Thats what i have done so far:
    - downloade tess4j, and extracted to c:\tess4j
    - edited the jTesseract.xml, an changed c:\temp\tess4j for c:\tess4j, where i put the others jars.

    Im using english for language.

    Thats my C:\Tess4J\tessdata folder:
    Capturar.PNG
     
  12. Daestrum

    Daestrum Well-Known Member Licensed User

    ok try this - it is using the jTesseract as a class module.

    You will need to change the location of the jars in the main module.
     

    Attached Files:

  13. jroriz

    jroriz Active Member Licensed User

    Nope, same error.

    Well, try downloading my project (your last project updated), and my tess4j folder: https://www.dropbox.com/sh/niwxbn1hjf813xr/AABXzShZ5_iAoJk4nPZOwXLMa?dl=0
    (use the top right "download" option)
     
  14. Daestrum

    Daestrum Well-Known Member Licensed User

    I loaded your b4j app - changed the path to the jars in the main module and it worked fine.
     
  15. Daestrum

    Daestrum Well-Known Member Licensed User

    Note my tessdata folder only has eng.traineddata in it no other files at all.
     
  16. jroriz

    jroriz Active Member Licensed User

    Is there something special with file tess4j-3.4.8.jar wich is in the C:\Tess4J\dist folder. Shoud it be copied to any special place?
    Did you "instaled" something? I simply downloaded and unziped the tess4j.
     
  17. Daestrum

    Daestrum Well-Known Member Licensed User

    On the link I posted in post #8 I just downloaded the file (it was called jar_files.zip), unpacked it into my c:/temp/tess4j folder
     
  18. jroriz

    jroriz Active Member Licensed User

    I've done it all again and now it worked.
    But it's issuing a warning, which I think slows down the OCR process.
    "Warning: Parameter not found: enable_new_segsearch"
     
  19. jroriz

    jroriz Active Member Licensed User

    amaxco, moster67 and DonManfred like this.
  20. Daestrum

    Daestrum Well-Known Member Licensed User

    You really don't owe me anything.
    I enjoy writing code not for monetary reward.
    What you could do with it
    A, buy a book on java this will help you.
    B, have a meal with it.
    C, use it to extend your b4x licence.
    D, keep it.

    The plus for me is I had never heard of Tesseract before today, so I learned something new too.
     
    Last edited: Jan 15, 2019
    Krammig, avalle, andyr00d and 10 others like this.
Loading...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice