Android Question MLKIT and Numbers, Digits Detection/Recognition

Magma

Expert
Licensed User
Longtime User
Hi there...

Well, I have a small project for a friend and I want to detect "water meters / counters"... and specifically get only the part I need...

I ve tried the example of Erel (MLKIT Recognition), here.. works but not the best result... and i am think if needs to somehow train the system... my app... a model for using for that i need...

I am attacting some screenshoots to understand the problem..

And I am analyzing the best result took...

1740482790468.png


The OCR giving the following:
ATLaa-U-004/0
o10216m

So let's say that somehow i must understand that the value: "o10216m" is what i need... actually is 01026

1. Is there a way to have some rectangles on camera-image (live) and selecting the rectangle (pane) and may be a zoom-control give me the result ? (Something like GOOGLE LENS)
2. Is there a better way for such things ?
3. Is there a way selecting different model - for MLKIT ... ?

Thanks in advance...
 

Attachments

  • Screenshot_20250225_104355.jpg
    Screenshot_20250225_104355.jpg
    369.7 KB · Views: 120
  • Screenshot_20250225_104518.jpg
    Screenshot_20250225_104518.jpg
    288.1 KB · Views: 120
  • Screenshot_20250225_104549.jpg
    Screenshot_20250225_104549.jpg
    341.5 KB · Views: 112
  • Screenshot_20250225_104618.jpg
    Screenshot_20250225_104618.jpg
    389.5 KB · Views: 113

drgottjr

Expert
Licensed User
Longtime User
your small project is not as simple as you think.
you often need opencv for cases like this. and it has to be trained.
part of the issue can be the font. part involves numbers which are in the middle of ascension. part involves the camera and the user.

with mlkit (or similar), if you create a little frame into which the user has to fit the meter display, you can easily self-crop the confusing
surrounding output, but you will still have problems reading the numbers reliably. below find an example of such a frame. it's not designed for meter reading, but it's easy to customize. i use it for reading barcodes that are hard to isolate (i have several sizes and shapes ready-made. of course, the image can also be cropped after the fact, but the pre-cropping frames are quick). my water meter uses a nice font, so i can read the numbers which are not captured in the middle of motion. number in various stages of completion would be very difficult to train with. the trick with google lens involves so-called "quiet zones" used by bar code generators and readers. they have to be detected and once the camera software knows where they are, they can zoom in for decoding. a simple bar code reader, starts in the middle of the image and works its way out looking for the quiet zone. once there, everthing within the zone is to be decoded. ocr looks for horizontal "lines" of text, so it tries to create a similar quiet zone around the line(s) of text. bar codes are required to have the quiet zones. water meters are not designed in this way.
 

Attachments

  • meter.png
    meter.png
    294.1 KB · Views: 104
Last edited:
Upvote 0

Magma

Expert
Licensed User
Longtime User
your small project is not as simple as you think.
you often need opencv for cases like this. and it has to be trained.
part of the issue can be the font. part involves numbers which are in the middle of ascension. part involves the camera and the user.

with mlkit (or similar), if you create a little frame into which the user has to fit the meter display, you can easily self-crop the confusing
surrounding output, but you will still have problems reading the numbers reliably. below find an example of such a frame. it's not designed for meter reading, but it's easy to customize. an example is attached. i use it for reading barcodes that are hard to isolate (i have several sizes and shapes ready-made. of course, the image can also be cropped after the fact, but the pre-cropping frames are quick). my water meter uses a nice font, so i can read the numbers which are not captured in the middle of motion. number in various stages of completion would be very difficult to train with. the trick with google lens involves so-called "quiet zones" used by bar code generators and readers. they have to be detected and once the camera software knows where they are, they can zoom in for decoding. a simple bar code reader, starts in the middle of the image and works its way out looking for the quiet zone. once there, everthing within the zone is to be decoded. ocr looks for horizontal "lines" of text, so it tries to create a similar quiet zone around the line(s) of text. bar codes are required to have the quiet zones. water meters are not designed in this way.
Hmmm the frame is a nice solution!

May be I need a combined method of MLKIT and AI (not with OCR but with Optical Recognition / that seems different for ChatGPT - using various methods)...

what do you think ? - ofcourse the cost raising up... but may be the recognition is far better...

or

In other solution - need to train with Tesseract/Tensorflow with a custom Server with API... to send the photo and return me result to b4a client...
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
i don't see how you're going to read a number that is caught half way in between in its little window. there must be millions of possibilities relating to what that would look like. (how far can, eg, the number 8 be up in the window before you can recognize that a 9 is coming? and which do you report? the 8 or the 9?) of course, that might be avoided if the display is digital, but that brings in its own nightmares. (try reading an lcd font.)

and what do you do when you can't read the numbers? or you get numbers and little garbage bits? edit by hand? key in the numbers manually? and how does engine know when the numbers are "wrong"?

tesseract is known to have issues with numeric input. and some digits are easily confused with their alpha look-alike. you will have a lot of training to do. why do you think automobile license plates have a special font?

the ai may be able to help, but you have to ask it the right questions.

i appreciate a challenge as much as the next guy, but - more and more - meters are being read with wifi or bluetooth.
 
Upvote 0

emexes

Expert
Licensed User
Longtime User
how far can, eg, the number 8 be up in the window before you can recognize that a 9 is coming?

Assuming the meter is decimal - and with digit 8 and 9, it can't be octal or binary - then if the departing digit is an 8, we can be reasonably certain a 9 is coming, no recognition required.
 
Upvote 0

emexes

Expert
Licensed User
Longtime User
there must be millions of possibilities relating to what that would look like

Or you just train it on the 10 possible combined double-digits that are actually possible, with say 5-10 offsets of each possible combination.

Coincidentally, I've been taking weekly photos of my parents water meter, after they received a $600 water bill for usage 100 kilolitres above normal, which we think was due to a carry of 2 rather than 1 when it rolled across x99999 to x00000. The problem I have with thair meter is that the viewport has some kind of magnifying effect over the last digit that makes it difficult to read even directly, let alone with a camera in between. But it doesn't matter too much, because the water company doesn't read the last three digits anyway, ie the bill would show a meter reading of 2132 kilolitres, not 2132104 litres.

1740534525535.png
 
Upvote 0

emexes

Expert
Licensed User
Longtime User
how does engine know when the numbers are "wrong"?

If the previous reading is available, then you can rule out all the implausible usages. Eg meter reading shouldn't go backwards, or forwards more than would be possible at full flow, eg a household water connection might be capable of a half litre/second = 43 kL/day = 1314 kL/month = 3942 kL/bill (every three months) and so if the meter advanced more than that, then clearly something is wrong. And if the meter moved more than say 3x the typical billing-period usage, then it should be verified manually. Also, for many applications, it's not life-or-death anyway, in that if there is an over-read leading to an over-charge on one billing cycle, then there will be a corresponding under-charge on the next billing cycle (complicated by stepped pricing, admittedly).
 
Upvote 0

emexes

Expert
Licensed User
Longtime User
a small project for a friend and I want to detect "water meters / counters"

Hey, is there only a small range of meter models to be read?

That'd make it easier because then you'd know the location of the digits relative to the outline of the meter and common markings on the meter faceplate. Plus also the precise style (font) and colour of the digits and background.

Has your friend been building up a collection of sample photos of various meters and readings?

I have a vague memory of somebody doing machine learning on a collection of thousands of digit images, arranged in a grid in larger images.
 
Upvote 0

MrKim

Well-Known Member
Licensed User
Longtime User
This is what I did for Barcodes and QR codes The rectangle is just a panel with a border:
1746002483878.png

By allowing the full view and only using what is in the window it is easier for the user to locate the target.

B4X:
Private Sub Camera1_Preview (data() As Byte)
Try
    If DateTime.Now > LastPreview + IntervalBetweenPreviewsMs Then
        Dim frameBuilder As JavaObject
        Dim bb As JavaObject
        bb = bb.InitializeStatic("java.nio.ByteBuffer").RunMethod("wrap", Array(data))
        frameBuilder.InitializeNewInstance("com/google/android/gms/vision/Frame.Builder".Replace("/", "."), Null)
        Dim In As InputStream
        In.InitializeFromBytesArray(camEx.PreviewImageToJpeg(data, 100), 0, data.Length)
        Dim bmp As Bitmap
        bmp.Initialize2(In)
        bmp = bmp.Rotate(90)  'I don't know why we need this - shouldn't
        bmp = bmp.Crop(bmp.Width * NumberFormat(((TargetPnl.Left - QRReader.Left) / (QRReader.Width)), 2, 2), bmp.Height * NumberFormat((TargetPnl.Top / QRReader.Height), 2, 2), bmp.Width * (TargetPnl.Width / QRReader.Width), bmp.Height * (TargetPnl.Height / QRReader.Height))  'width IS .X so  (W-.X) / 2 = Left
        #If Debug
            TestImgView.Width = TargetPnl.Width
            TestImgView.Height = TargetPnl.Height
            TestImgView.Gravity = Gravity.FILL
            TestImgView.SetBackgroundImage(bmp)   
        #End If
        frameBuilder.RunMethod("setBitmap", Array(bmp)) 'instead of the setImageData line
        Dim frame As JavaObject = frameBuilder.RunMethod("build", Null)
        Dim SparseArray As JavaObject = detector.RunMethod("detect", Array(frame))
        LastPreview = DateTime.Now
        Dim Matches As Int = SparseArray.RunMethod("size", Null)
        If Matches > 0 Then
            Dim barcode As JavaObject = SparseArray.RunMethod("valueAt", Array(0))
            Dim raw As String = barcode.GetField("rawValue")
            FoundBarcode(raw)
        End If
    End If
    
Catch
    Log(LastException)
    StopCamera
    StartCamera
End Try
End Sub
The key line in Camera1_Preview is:
B4X:
bmp = bmp.Crop(bmp.Width * NumberFormat(((TargetPnl.Left - QRReader.Left) / (QRReader.Width)), 2, 2), bmp.Height * NumberFormat((TargetPnl.Top / QRReader.Height), 2, 2), bmp.Width * (TargetPnl.Width / QRReader.Width), bmp.Height * (TargetPnl.Height / QRReader.Height))  'width IS .X so  (W-.X) / 2 = Left
This crops the image to what is only in the rectangle before it is passed to frameBuilder and the rest of the code to determine if it is a valid barcode. NOTE IntervalBetweenPreviewsMs this keeps camera from doing too many damn scans to quickly. I needed this because of the next step:

B4X:
rivate Sub FoundBarcode (msg As String)
Try
    If WereHere Then Return  'Prevents duplicate scans while getting data
    #IF DEBUG
        Log("ReadText: " & ReadTxt & " Message: " & msg)
        Toast.Show($"Found [Color=Blue][b][plain]${msg}[/plain][/b][/Color]  ${ReadTxt}"$)
    #End If
    If ReadTxt = "" Then
        ReadTxt = msg
        ReadTxt2 = ""
        If LessSensitive Then Return
    Else If    ReadTxt2 = "" Then
        ReadTxt2 = msg
        If ReadTxt.EqualsIgnoreCase(ReadTxt2) Then  'we are done reading\
        Else  'try again
            ReadTxt = ""
            ReadTxt2 = ""
            Return
        End If
    End If
    StopCamera
    WereHere = True
    Toast.Show("Scan Complete, getting data.")

I READ THE BARCODE MORE THAN ONCE. And if I have a match then I know I have a good scan. You can set the sensitivity so that it has to read 3 times before you have a match. For me the reason for this was it was too easy for the user to get match before he settled on the barcode he actually wanted but for you you can scan multiple times and when you get two or 3 matches you probably have a good one. I would probably look for the obvious mistakes like os and Os that should be 0s and convert them before testing for a match. After doing that just throw out any scans that aren't numbers and scan again.

Another thought: If you are getting multiple numbers that are similar you might present the user with a list and let him select the right one.

GOOD LUCK!
 
Upvote 0
Top