B4J Question problem using resumable subs to run concurrent instances

xulihang

Active Member
Licensed User
Longtime User
Hi there,

I am using jshell to run tesseract to ocr images.

I use many wait for resumable subs like this:

B4X:
Sub OCRAllBoxes
    For Each box in boxes
        wait for (OCROne) Complete (result as string)
    Next
End Sub

Sub OCROne as ResumableSub
    wait for (tesseract) Complete (result as string)
    return result
End Sub

Sub tesseract as ResumableSub
    wait for sh_ProcessCompleted (Success As Boolean, ExitCode As Int, StdOut As String, StdErr As String)
    return result
End Sub

Now I want to run multiple tesseract instances at the same time. How can I do this?

I changed OCRAllBoxes to this, but it cannot wait for the tesseract to give the result.

B4X:
Sub OCRAllBoxesConcurrent(img As B4XBitmap,boxesList As List) As ResumableSub
    Dim count As Map
    count.Initialize
    count.Put("completed",0)
    For Each box As Map In boxesList
        OCROne(img,box,count)
    Next
    Do While count.Get("completed")<>boxesList.Size
        Log(count.Get("completed"))
        Sleep(1000)
    Loop
    Return ""
End Sub
 

xulihang

Active Member
Licensed User
Longtime User
Since I cannot get params back from tesseract jshell, I decided to let it run a batch list and retrieve the result from a list of files

B4X:
Sub BatchOCR(img As B4XBitmap,boxesList As List) As ResumableSub
    Dim names As List
    names.Initialize
    Dim boxMap As Map
    boxMap.Initialize
    For Each box As Map In boxesList
        Dim cropped As B4XBitmap=croppedAndPreprocessedImage(img,box)
        Dim name As String=Utils.saveImgToDiskWithUniqueName(cropped)
        names.Add(name)
        boxMap.Put(name,box)
        OCR.tesseract(name,OcrLang,False)
    Next
    Dim interval as int=200
    Do While names.Size>0
        Sleep(interval )
        Log(names)
        Dim name As String=names.Get(0)
        If File.Exists(File.DirApp,name&".txt") Then
            Dim box As Map
            box=boxMap.Get(name)
            box.Put("text",PostProcessedOCRResult(File.ReadString(File.DirApp,name&".txt")))
            If OCR.DeleteTesseractTempFiles(name)=True Then
                names.RemoveAt(0)
                interval =0
                Continue
            End If
        End If
    Loop
    Return ""
End Sub

The time difference of OCRing 11 images:
before: 15650
after: 11237
 
Last edited:
Upvote 0
Top