B4J Library [B4X] xOCR Class

This Class (for B4J, B4A and B4i) uses the ocr.space service to convert scans or (smartphone) images of text documents into editable files by using Optical Character Recognition (OCR) technologies. It uses state-of-the-art modern OCR software. The awesome recognition quality is comparable to commercial OCR SDK software (e. g. Abbyy).

You can extract the raw text and get the coordinates of the bounding boxes and the lines for each word if you like.

You need a free API Key with 25,000 Reqests/month or 500 calls/Day
Register here for your free OCR API key.
The free key is also allowed to use in commercial apps!

This class depends on XUI and OKHttpUtils2.

The uploaded image should be 1MB or less

Unbenannt.png


B4X:
Sub Process_Globals
    Private fx As JFX
    Private MainForm As Form
    Dim xui As XUI
    Dim OCR As xOCR
End Sub

Sub AppStart (Form1 As Form, Args() As String)
    MainForm = Form1
    MainForm.RootPane.LoadLayout("Main") 'Load the layout file.
    MainForm.Show
    xui.SetDataFolder("OCR")

    OCR.Initialize(Me,"ocr","your_api_key")
    OCR.OCR("ger",xui.LoadBitmap(File.DirAssets,"image.jpg"),False,True)

End Sub

'Return true to allow the default exceptions handler to handle the uncaught exception.
Sub Application_Error (Error As Exception, StackTrace As String) As Boolean
    Return True
End Sub


Sub OCR_finished (Text As String,ProcessingTime As Int)
    Log(Text)
    Log(ProcessingTime & " ms")
End Sub

Sub OCR_overlay (Overlay As Map)
    Log(Overlay)
End Sub


The OCR function need following parameters:

Language:
Language used for OCR. If you pass "" English eng is taken as default.
IMPORTANT: The language code has always 3-letters (not 2). So it is "eng" and not "en".

Arabic = ara
Bulgarian = bul
Chinese(Simplified) = chs
Chinese(Traditional) = cht
Croatian = hrv
Czech = cze
Danish = dan
Dutch = dut
English = eng
Finnish = fin
French = fre
German = ger
Greek = gre
Hungarian = hun
Korean = kor
Italian = ita
Japanese = jpn
Norwegian = nor
Polish = pol
Portuguese = por
Russian = rus
Slovenian = slv
Spanish = spa
Swedish = swe
Turkish = tur

image
the image wich should be compute

Autorotate
If set to true, the api autorotates the image correctly

Overlay
If true, returns the coordinates of the bounding boxes for each word.
 

Attachments

  • xOCR.bas
    3 KB · Views: 460
Last edited:

Erel

B4X founder
Staff member
Licensed User
Longtime User
Should i post this class in B4A Forum too?
No need. The search engine knows that [B4X] threads are cross platform.

Note that your code will work in B4i as well. You just need to change the number in xui.SubExists to the correct number of parameters (it will not affect the behavior in B4A and B4J).
 
Top