Android Question MQTT Memory consumption

Blueforcer

Well-Known Member
Licensed User
Longtime User
In my current project, I have to send files via mqtt, for this I split them into several 5MB chunks and send them one after the other in a Do While loop as base64 in a json.
Everything works wonderfully. The only problem is that with each mqtt.publish the memory decreases by the size of the current chunk. Only when the file has been transferred completely the memory slowly become free again. But the problem is that if the file is too big I get an OutOfMemory exception at some point. QOS makes no difference here. I tested it by comment the mqtt.publish line, and the free memory stays stable.

Is there a problem with the garbage collector in the MQTT libary or is there a trick i can use to avoid the problem??
 

Jeffrey Cameron

Well-Known Member
Licensed User
Longtime User
IMHO garbage collection in any OS is unreliable and should not be relied upon for program functionality. It's fine for the overall health of the OS, but it doesn't care about your particular program.

For files that large uploading via FTP would be a much easier option. Are you creating a new "chunk" each time, or re-using the same object?
 
Upvote 0

Blueforcer

Well-Known Member
Licensed User
Longtime User
It's very nice of you all to suggest how and why it could be done differently. However, that was not my question. There are very good reasons why it is done this way.But so that it is clear to everyone:
@Erel The file size is completely different and can range from a few MB to several hundred MB. We are not talking about 0815 Appstore apps here, but an app which is used in a measurement and control system for swimming pools, chlorine elektrolysis etc developed by my company.
These devices have remote access for the end customer as well as for the specialist dealer and our service department. this mainly involves 1:1 control of the devices from a distance. But for service purposes we also (rarely) need file transfer, which is why our web portal also has a file manager wich operates entirely with mqtt:

1740040308783.png


The entire communication takes place via MQTT, which has several reasons, on the one hand it is extremely fast and lightweight for normal remote control, and the customer does not have to deal with port releases etc, In addition, MQTT can be easily clustered for high workload. Furthermore the complete communication is handled by the browser client, not with a Backend. This massively reduces the workload of the web server, as all the work is outsourced to the respective browser client, which is particularly helpful when thousands of devices are connected. The last option would of course be via HTTP upload / download, but we wanted to limit ourselves to one protocol for the time being.

Why do we send the data as base64 together in a json?
Since we send the data in chunks, the server must know how many chunks the request contains and also which chunk the current message is. It is also important to include a session ID so that the server can differentiate the downloads accordingly in the case of simultaneous downloads. And for security reasons, an MD5 hash is also sent to ensure file integrity. So the json also contains meta-data. Incidentally, this also works in the other direction if we want to load data onto the client via the remote file manager.

The problem itself is not the json or the Base64 string, because even if I send the chunks directly as bytes, the available memory decreases with each chunk. For example, if I only have 1GB RAM and send a file with 1.5gb then it doesn't work and after the 100th chunk (depending on the chunk size) I get an out of memory error.

So the main question is: Is it possible to free up the mqtt client after each chunk?


@amorosik :
B4X:
Sub PrepareDownload(data As Map)
    Dim filename As String = data.GetDefault("filename", "")
    Dim session As String = data.GetDefault("session", "")
    Dim ackRequired As Boolean = data.GetDefault("ackRequired", True)
 
    ' Define a command string for acknowledgment or logging purposes.
    Dim cmd As String = "downloadFile"

    ' Check if the file exists before proceeding.
    If filename <> "" And File.Exists(baseDir, filename) Then
        Dim su As StringUtils
  
        ' Initialize RandomAccessFile to read the specified file.
        Dim raf As RandomAccessFile
        raf.Initialize(baseDir, filename, False)
  
        ' Obtain the file size for chunking and progress calculations.
        Dim fileSize As Long = raf.Size
        Log("File size: " & fileSize)
  
        ' Set the chunk size in bytes (for instance, 10 MB here).
        Dim chunkSize As Int = 10 * 1024 * 1024
  
        ' This offset tracks our position in the file as we read in chunks.
        Dim offset As Long = 0
  
        ' First pass: calculate the MD5 hash of the entire file (also in chunks to prevend memory exceptions).
        Dim jo As JavaObject
        jo.InitializeStatic("java.security.MessageDigest")
        jo = jo.RunMethod("getInstance", Array("MD5"))
  
        ' Loop through the file in chunks to update the MD5 digest.
        Do While offset < fileSize
            Dim remaining As Long = fileSize - offset
            Dim actualSize As Int = chunkSize
            If remaining < chunkSize Then actualSize = remaining
      
            Dim buffer(actualSize) As Byte
            raf.ReadBytes(buffer, 0, actualSize, offset)
      
            ' Update the MD5 digest with the current chunk's bytes.
            jo.RunMethod("update", Array(buffer))
      
            ' Advance the offset by the number of bytes read.
            offset = offset + actualSize
        Loop
  
        ' Retrieve the final MD5 hash as a byte array, then encode it in Base64.
        Dim finalHash() As Byte = jo.RunMethod("digest", Null)
        Dim fileHash As String = su.EncodeBase64(finalHash)
  
        ' Reset offset for the second pass where we actually send chunks.
        offset = 0
  
        ' Calculate the total number of chunks for reference.
        Dim totalChunks As Int = Ceil(fileSize / chunkSize)
  
        ' Second pass: read and publish file data chunk by chunk.
        Do While offset < fileSize
            Dim remaining As Long = fileSize - offset
            Dim actualSize As Int = chunkSize
            If remaining < chunkSize Then actualSize = remaining
      
            Dim buffer(actualSize) As Byte
            raf.ReadBytes(buffer, 0, actualSize, offset)
      
            ' Increment offset before we build the MQTT message.
            offset = offset + actualSize
      
            ' Encode the current chunk's data in Base64 for transmission.
            Dim chunkBase64 As String = su.EncodeBase64(buffer)
      
            ' Build a JSON map containing chunk data and metadata.
            Dim mp As Map
            mp.Initialize
            mp.Put("filename", filename)
            mp.Put("data", chunkBase64)
            mp.Put("chunkIndex", (offset / chunkSize) - IIf(remaining < chunkSize, 0, 0))
            mp.Put("totalChunks", totalChunks)
            mp.Put("hash", fileHash)
            mp.Put("session", session)
      
            ' Convert the map to a JSON string, then to a byte array for publishing.
            Dim Payload() As Byte = mp.As(JSON).ToCompactString.GetBytes("UTF8")
            mqtt.Publish(Tools.deviceID & "/download", Payload)
            Log("publish chunk #" & (offset / chunkSize))
        Loop
  
        ' Close the file reader once all chunks are sent.
        raf.Close
  
        ' Optionally send an acknowledgment if required.
        If ackRequired Then
            SendAcknowledgment(session, cmd, CreateMap("filename": filename), True, "")
        End If
    Else
        ' If the file does not exist, optionally send a negative acknowledgment.
        If ackRequired Then
            SendAcknowledgment(session, cmd, CreateMap("filename": filename), False, "File not found")
        End If
    End If
End Sub
 
Last edited:
Upvote 0

emexes

Expert
Licensed User
very interesting

My first five observations of interest (before delving into the memory usage, if nobody else does) are:

1/ what is the purpose of this? Looks like it is always zero. Or am I misunderstanding IIf?
B4X:
... - IIf(remaining < chunkSize, 0, 0))

2/ this buffer is going to always be the same size, except for the last chunk, so consider:
B4X:
before loop:
    Dim buffer(1) As Byte

inside both loops (MD5 loop and sending loop):
    If buffer.Length <> actualSize Then    're-use if already correct size'
        Dim buffer(actualSize) As Byte     'othereise re-size
    End If

3/ 4 of these 6 chunks are the same for every chunk. I get that it might need the session field for every chunk, but I'm pretty sure you could just send the filename, totalChunks and (file)Hash fields once, say with the first chunk (or maybe twice, ie again with the last chunk, to be sure, to be sure)
B4X:
Dim mp As Map
mp.Initialize
mp.Put("filename", filename)
mp.Put("data", chunkBase64)
mp.Put("chunkIndex", (offset / chunkSize) - IIf(remaining < chunkSize, 0, 0))
mp.Put("totalChunks", totalChunks)
mp.Put("hash", fileHash)
mp.Put("session", session)

4/ also, a filesize in bytes might be useful, otherwise I suspect the MD5 at the other end will be different, if the filesize is not an integral multiple of chunkSize

5/ consider doing the repeated stuff just once (should be no problem, since is soon converted into a new JSON String anyway)
B4X:
before loop:
    Dim mp As Map
    mp.Initialize
    mp.Put("filename", filename)
    mp.Put("totalChunks", totalChunks)
    mp.Put("hash", fileHash)
    mp.Put("session", session)

inside loop:
    mp.Put("data", chunkBase64)
    mp.Put("chunkIndex", (offset / chunkSize) - IIf(remaining < chunkSize, 0, 0))

Righto, now that I've got you totally offside by picking at points that are more style mis-steps in the heat of programming battle than of actual bugs - although, to be fair, I have slowly discovered with programming that the less work the program does, the less chance there is to go wrong - let's chew on the memory problem 🍻

(assuming nobody else has stepped up to the batting plate whilst I've been writing this post 🙃 )
 
Last edited:
Upvote 0

emexes

Expert
Licensed User
Lol I should warn you that I've never used MQTT, so I'm just making blind(ish) guesses. Brainstorming, but without the brain. :oops:

But with luck, one of my dumb guesses might accidentally point you in the right direction.

First thing is: where is the out-of-memory situation occurring? My understanding is that MQTT requires a central hub to collect and distribute messages to clients. Is that central hub running on the same computer as your send-file-over-MQTT routine?

Second thing is: how many clients are receiving each file? Are the individual files sent separately to individual clients? Or is one sent file received by many clients?

Third thing is that the MQTT software that you are calling from your file sending routine, that is effectively part of your program and using the same memory as your program - is it possible that all the file chunks that you hand to it, are building up in a queue that the MQTT doesn't have a chance to process because you're sending the file in a tight loop that doesn't relinquish the CPU back to your program's (secret hidden background) message handler loop until it has constructed and sent all the file chunks?

An easy check might be to put a Sleep(1) after sending each chunk. Yikes, I just spotted that your chunk size is 10 MB. Ok, so what might well be happening is that, let's say MQTT is sending over internet at 10 Mbps = about 1 MB per second (on a good day, if the wind is blowing in the right direction). If you are generating chunks faster than that, then they are going to queue up at your MQTT end of the cable, and it is possible that you might have queued most or all of the file before MQTT has even sent and freed up memory of the first chunk.

Or is there some data flow regulation going on? Maybe you've got a handshake going that doesn't queue up the second packet until the first packet has been sent. Give me five minutes to have another read at your file sending code. 🍻
 
Upvote 0

Blueforcer

Well-Known Member
Licensed User
Longtime User
I also tried to put a sleep of 5 seconds after each chunk. no difference.
And not the payload is the problem, but the publish function itself.. there is no memory decrease when i just remove the mqtt.publish line.
According to handshake, i set QOS to 0. Makes also no differnence.
 
Upvote 0

emexes

Expert
Licensed User
On the bright side: that 10 MB chunk size (more like a 13 MB after Base64 encoding) means that the overhead of those 3 or 4 repeated items is negligible. 🍻
 
Last edited:
Upvote 0

emexes

Expert
Licensed User
How long does a 10 (13) MB chunk take to move itself from your program's device, over the network connection to the MQTT hub/server ?

Because that memory can't be freed until the chunk has been sent to and received by the server.

If it takes 10 seconsds, ie chunks are being added to the MQTT send queue twice as fast as they're being cleared out, then by the time you've added the entire file to the MQTT send queue, only half of it will have been cleared out, and half of it will still be in the queue.

I was initially wondering how the heck the MQTT hub/server was going to handle all those GB of messages, but then I realised it is probably storing them on disk, and thus the total size of all outstanding yet-to-be-distributed messages is no problem.
 
Upvote 0

emexes

Expert
Licensed User
Does the MQTT library have a property that you can use to tell how full the send queue is?

Like, maybe you can pause adding chunks to the queue if there is more than 100 MB still waiting to be sent, eg something like:

B4X:
For Each Chunk In FileToSend
    Do While MQTT.BufferUsed > 100000000    '100 M
        Sleep(5)    'turns this sub into a ResumableSub which operates in background
    Loop
   
    MQTT.Send(Chunk)
Next
 
Upvote 0

Blueforcer

Well-Known Member
Licensed User
Longtime User
Tha MQTT broker can handle this pretty well, our Broker accept payloads up to 265MB.
QOS 0 in MQTT means fire and forget.. and even if there is some buffer it should be free after some time.. but its not.. even after 100 chunks its decreasing more and more in the same amount of the chunk size
 
Upvote 0

emexes

Expert
Licensed User
Do you get a "message received" confirmation from the MQTT hub/server?

Could you use that to trigger sending of the next chunk?

(lol reminds me of an interrupt-driven UART)

Can you subscribe to your own MQTT channel (or whatever it's called) ?

In which case, when the MQTT publishes it back to you, then you know that it has definitely left your MQTT send queue and (presumably) the memory released.

Bonus - you can compare the reflected chunk to the bytes that you send, to know that the chunk wasn't corrupted during transmission.
 
Upvote 0

emexes

Expert
Licensed User
even after 100 chunks

Work out - ideally, measure - how long it takes a chunk to move across the network from your sending device to the central MQTT hub/server.

Then add a Sleep for that long plus say 20% or even 100%. All is fair in love and war debugging.

Programming stages:
- get program to compile
- get program to run
- get program to run correctly
- get program to run reliably
- get program to run fast

Also... the slower the program runs, the more time you have to think.

nevergiveup320.jpg
 
Upvote 0

Blueforcer

Well-Known Member
Licensed User
Longtime User
The sending interval is not the problem. The chunk is already received by the requesting client before the next one is sent
 
Upvote 0

emexes

Expert
Licensed User
The chunk is already received by the requesting client before the next one is sent

Ok. I don't remember seeing any wait-until-confirmed loops in the code.

Does the MQTT.Publish call block until the chunk is confirmed received by the requesting client?

I didn't think that was a done thing in B4A/I/J, and that the usual way to was to do long background operations was to use Wait For.

If it's not the MQTT send queue filling up quicker than it is being emptied, then... we are heading into Erel territory.

But before we bother him, we need to have one more crack at checking the chunk is out of the queue. Like, can you see that the chunks are actually being received, with the expected offsets, on another device? How long do they take to go from your app to the server to the receiving device? Does that align with the network speed?

In fact, another easy test would be to disconnect the network cable/connection to the MQTT hub/server. If the chunks still look like they're getting through when they obviously cannot be, then we are one step closer to working out what the heck is going on.

When you do find out, I'm keen to know. It's an interesting problem that shouldn't be happening. And when we find out what it is, we'll be kicking ourselves for missing the obvious. But wiser, too. It'll be great!
 
Upvote 0

Blueforcer

Well-Known Member
Licensed User
Longtime User
I never said that im waiting for a ACK from the broker. The MQTT lib doesnt support the mqtt_MessageDelivered event.
I could wait 10-20s between the chunks, even the other client recevied it after 2s. But that shouldnt be the goal. 1Gb would take forever...

10MB chunks in 10s interval:

B4X:
Waiting 10s
Mem: 374.222 MB
publish chunk #21
Waiting 10s
Mem: 360.644 MB
publish chunk #22
Waiting 10s
Mem: 347.076 MB
publish chunk #23
Waiting 10s
Mem: 347.08 MB
publish chunk #24
Waiting 10s
Mem: 317.041 MB
publish chunk #25
Waiting 10s
Mem: 302.8 MB
publish chunk #26
Waiting 10s
Mem: 302.467 MB
publish chunk #27
Waiting 10s
Mem: 289.032 MB
publish chunk #28
Waiting 10s
Mem: 275.456 MB
publish chunk #29
Waiting 10s
Mem: 275.447 MB
publish chunk #30
Waiting 10s
...
 
Upvote 0
Top