Share My Creation Screen OCR in 5 minutes and 50 lines!

Discussion in 'B4J Share Your Creations' started by jroriz, Mar 14, 2018.

  1. jroriz

    jroriz Member Licensed User


    Exactly like the thread title! 5 minutes. 50 lines...

    1 - Download and install Tesseract (<15Mb):
    2 - Install Tesseract in "C:\TO" - OR change the DoOcr sub to match the location where you installed it
    3 - Start the program. Move the frame to the position of the screen where the text to be read is.
    4 - Click OCR!
    5 - That's it!

    Attached Files:

    jmon, DonManfred, Mashiane and 7 others like this.
  2. roberto64

    roberto64 Member Licensed User

    hi, com libjAWT Robot ver. 1.55 does not recognize the commands,"c3po.runCommand"-"c3po.rectangleAsArbitrary"-"c3po.CreateScreenCaptureToFile"
    Sub DoOCR
    "C:\TO\tesseract frame.png text")
    End Sub
    Sub ScreenCapture
        c3po.rectangleAsArbitrary(OCRFrame.f.WindowLeft, OCRFrame.f.WindowTop, OCRFrame.f.Width, OCRFrame.f.Height)
    End Sub
  3. jroriz

    jroriz Member Licensed User

    Roberto deve ser brasileiro... Tem um monte de jeito de resolver isso. Um deles é usar a 1.0 (anexa).

    Pode também usar o jshell:

    Sub DoOCR
    Dim shl As Shell
    """C:\TO\tesseract"Array As String("frame.png""text"))
        shl.WorkingDirectory = 
    'c3po.runCommand("C:\TO\tesseract frame.png text")    ' change tesseract folder if needed
    End Sub

    Attached Files:

    Last edited: Apr 4, 2018
  4. joulongleu

    joulongleu Member

    Hi:jroriz Can use Chinese, I copy chi_sim.traineddata into tesseract-OCR\tessdata ,But can,t use
  5. jroriz

    jroriz Member Licensed User

    You will need to change
    c3po.runCommand("C:\TO\tesseract frame.png text")
    c3po.runCommand("C:\TO\tesseract frame.png text -l chi_sim")

    There are newer versions of tessdata, with better-trained files.
    Google it and make tests.

    Note that there are other parameters you can try:

    Usage:tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...

    pagesegmode values are:
    0 = Orientation and script detection (OSD) only.
    1 = Automatic page segmentation with OSD.
    2 = Automatic page segmentation, but no OSD, or OCR
    3 = Fully automatic page segmentation, but no OSD. (Default)
    4 = Assume a single column of text of variable sizes.
    5 = Assume a single uniform block of vertically aligned text.
    6 = Assume a single uniform block of text.
    7 = Treat the image as a single text line.
    8 = Treat the image as a single word.
    9 = Treat the image as a single word in a circle.
    10 = Treat the image as a single character.
    -l lang and/or -psm pagesegmode must occur before anyconfigfile.

    Single options:
    -v --version: version info
    --list-langs: list available languages for tesseract engine

    Last edited: Apr 4, 2018
    joulongleu likes this.
  6. supriono

    supriono Member Licensed User

    i found this error
    Cannot run program "C:\TO\tesseract": CreateProcess error=2, The system cannot find the file specified
  7. DonManfred

    DonManfred Expert Licensed User

    Did you adapt the code to match the folder where you installed it?
    jroriz likes this.