Screen OCR in 5 minutes and 50 lines!

Discussion started by jroriz, Mar 14, 2018.

  jroriz

    Exactly like the thread title! 5 minutes. 50 lines...

    1 - Download and install Tesseract (<15Mb):
    2 - Install Tesseract in "C:\TO" - OR change the DoOcr sub to match the location where you installed it
    3 - Start the program. Move the frame to the position of the screen where the text to be read is.
    4 - Click OCR!
    5 - That's it!

    Attached Files:

  roberto64

    hi, com libjAWT Robot ver. 1.55 does not recognize the commands,"c3po.runCommand"-"c3po.rectangleAsArbitrary"-"c3po.CreateScreenCaptureToFile"
    Sub DoOCR
    "C:\TO\tesseract frame.png text")
    End Sub
    Sub ScreenCapture
        c3po.rectangleAsArbitrary(OCRFrame.f.WindowLeft, OCRFrame.f.WindowTop, OCRFrame.f.Width, OCRFrame.f.Height)
    End Sub
  jroriz

    Roberto deve ser brasileiro... Tem um monte de jeito de resolver isso. Um deles é usar a 1.0 (anexa).

    Pode também usar o jshell:

    Sub DoOCR
    Dim shl As Shell
    """C:\TO\tesseract"Array As String("frame.png""text"))
        shl.WorkingDirectory = 
    'c3po.runCommand("C:\TO\tesseract frame.png text")    ' change tesseract folder if needed
    End Sub

  joulongleu

    Hi:jroriz Can use Chinese, I copy chi_sim.traineddata into tesseract-OCR\tessdata ,But can,t use
  jroriz

    You will need to change
    c3po.runCommand("C:\TO\tesseract frame.png text")
    c3po.runCommand("C:\TO\tesseract frame.png text -l chi_sim")

    There are newer versions of tessdata, with better-trained files.
    Google it and make tests.

    Note that there are other parameters you can try:

    Usage:tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...

    pagesegmode values are:
    0 = Orientation and script detection (OSD) only.
    1 = Automatic page segmentation with OSD.
    2 = Automatic page segmentation, but no OSD, or OCR
    3 = Fully automatic page segmentation, but no OSD. (Default)
    4 = Assume a single column of text of variable sizes.
    5 = Assume a single uniform block of vertically aligned text.
    6 = Assume a single uniform block of text.
    7 = Treat the image as a single text line.
    8 = Treat the image as a single word.
    9 = Treat the image as a single word in a circle.
    10 = Treat the image as a single character.
    -l lang and/or -psm pagesegmode must occur before anyconfigfile.

    Single options:
    -v --version: version info
    --list-langs: list available languages for tesseract engine

  supriono

    i found this error
    Cannot run program "C:\TO\tesseract": CreateProcess error=2, The system cannot find the file specified
  DonManfred

    Did you adapt the code to match the folder where you installed it?
