B4J Question Processing of data skipped when incoming load is high

kostefar

Active Member
Licensed User
Longtime User
Dear All,

Ok, how do I explain this... I have setup an app which logs all incoming events on all symbols from an exchange, received via websockets.
The data arrives smoothly, and also is written correctly to the log file.
However, the app additionally deals with keeping the orderbooks (bids and asks) uptodate so that I can view what´s happening live, and here something goes wrong sometimes.
It can be hours till there´s a "glitch", and sometimes just minutes.
I have become aware of the issue by capturing a video of one of the exchange´s symbols while showing what´s going on in my app. Then when I afterwards look at the video, I look through the orderbooks in sync with 1000´s of events on this one symbol, until all of a sudden my own app does not update as it should, while the website does. The type of event is not one that did not occur before, and after the glitch, things will go on as before until the next glitch, messing up the orderbook data more and more.
All the orderbooks and tradedata are kept in separate lists, about 3000 in total. Each list holds a couple of hundred entries, and they´re all part of one map, with names like: xyz_asks, xyz_bids, xyz_history
I know it probably could all be alot more effecient in sqlite, but since it´s so rare I touch sql and really just wanted to proceed and see this stuff working, I gave up after two days and decided to build it all as lists.
So what I can say for sure here is that it´s not a bug in the code: The logging which shows the entries coming in via websockets is at a point where they´ll also amend the live orderbooks in my app, so my idea is that sometimes there´s so much data flowing in that the app does not have enough time to process the it. In the logfile there´s sometimes only 1-3 miliseconds between the time that events are written, so there´s alot going on - all the time.
I´m not so keen on sharing the code, which is why I´ve explained it all detailedly instead, and my question is:
How can I make sure that all events are processed when there are so many of them? I feel that DoEvents won´t do the job. I could of course build some sort of a queue system that does not process the next event untill the previous one is finished, but I´m hoping there´s a simpler way of doing this.

Thanks in advance!
 

kostefar

Active Member
Licensed User
Longtime User
No, I´m the client here, retrieving data from a server.

So, I´ll focus on two identical events in terms of what they do. They´re both events telling that somebody bought an amount of a symbol for the price of 0.05496000. This means that this amount will be removed from the list of ask orders in the orderbook for this particular price:

Here´s the first one, which went well:

B4X:
{"Action":1,"Amount":0.08690037,"Rate":0.05496000,"Total":0.0047760443352000,"TradePairId":5203,"Type":0,"UserId":null,"DataType":0}

And here´s the event which did not update the orderbook:

B4X:
{"Action":1,"Amount":0.26300559,"Rate":0.05496000,"Total":0.0144547872264000,"TradePairId":5203,"Type":0,"UserId":null,"DataType":0}

This is what happens when such a combination arrives:

B4X:
        If Main.Findinjson(entry,"Action") = 1 Then
                        If Main.Findinjson(entry,"Type") = 0 Then
                    action = "buy"
                    event = "buy"
    Dim price As Double = Main.Findinjson(entry,"Rate")
                    Dim vol As Double = Main.Findinjson(entry,"Total")
                 End if
                        End If

At the end of this, I have:

B4X:
Dim data As Map
data.Initialize
    data.Put (coin,action&" "&event&" "&price&" "&vol)
    AddFromSockets(data)
    WritetoFile(data)

WritetoFile is writing to the logfile where the previously showing inputs are coming from. There are 3 seconds between them.

So in AddFromSockets (input As Map), this is what happens:

B4X:
    Dim split () As String = Regex.Split(" ",input.GetValueAt(0))
    Dim coin As String = input.GetKeyAt(0)
    Dim action As String = split(0)
    Dim event As String = split(1)
    Dim price As String = split(2)
    Dim vol As String = split(3)
If event = "buy" Then  tablename = coin & "_ask"
list = coinmap.Get(tablename)
           For i = 0 To list.Size - 1
    Dim listsplit () As String = Regex.Split(" ",list.Get(i))
                Dim listprice As Double = listsplit(0)
                            Dim listvol As Double = listsplit(1)


If price = listprice Then
Dim newvol As Double
    newvol =listvol - vol
list.Set(i,price&" "&newvol)
coinmap.Put(tablename,list)

All very simplified. So it loads the _ask list for this symbol, and checks if there´s already a similar price present in it. If there is, it will deduct the value of the purchase from the current price and replace the listitem with the price and the new volume, and finally put that list back to where it belongs in coinmap. This happened only for the first of the two events - not for the 2nd.

Am I making sense here? I know that alot of things could be so much more elegant, but I´m not such an elegant person.. :)
 
Upvote 0

Erel

B4X founder
Staff member
Licensed User
Longtime User
This happens in TextMessage event, right?

Seems like you are parsing the json string multiple times though for a small string it will probably be fast enough.

Measure the time that it takes to handle each message. See whether it is close or more than the time between messages when in heavy load.

While it seems that there are several things in your code that can be optimized, my guess is that writing to the file is the slowest step. Can you post that code?
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
Yeah you´re right. It´s calling a module which parses the json string based on the different values that it needs.

The writing to file changes things a bit so that they don´t take up too much space:

B4X:
Sub WritetoFile (data As Map)
    Dim txwr As TextWriter
    Dim coin As String = data.GetKeyAt(0)
    Dim split()  As String = Regex.Split(" ",data.Getvalueat(0))
    Dim action As String = split(0)
    Dim event As String = split(1)
    Dim price As Double = split(2)
    Dim vol As Double = split(3)
    If action = "buy" Then action = "b"
    If action = "sell" Then action = "s"
    If event = "buy" Then event = "b"
    If event = "sell" Then event = "s"
    If event = "increase"  Then event = "i"
    If event = "decrease" Then event = "d"
    Dim datastring As String = DateTime.Time(DateTime.Now) & " " & coin & " " & action & " " & event & " " & price & " " & vol
    txwr.Initialize(File.OpenOutput("K:\exchangedata",Main.Inittime & "-alldata.txt",True))
    txwr.WriteLine(datastring)
    txwr.Close
End Sub

.. and for the handling time: In some cases I see in the log that some messages have the exact same timestamp - even on miliseconds. But what you´re after, is that the time it takes from when the message arrives from the websocket and all the way till after it has been processed and written to the log file?

EDIT: Ah well, I checked how long the whole thing takes, from when data arrives and till it´s written: Anything from 0 to 3 ms.
 
Last edited:
Upvote 0

OliverA

Expert
Licensed User
Longtime User
txwr.Initialize(File.OpenOutput("K:\exchangedata",Main.Inittime & "-alldata.txt",True))
txwr.WriteLine(datastring)
txwr.Close
Maybe only open txwr once per program start (I'm guessing that's what Main.Inittime refers to) and then you would not have to Initialize/close the writer constantly just for writing each line of the file. Currently, for every line, the system opens the file, has to seek to the end of the file, writes to the file and then closes it. Try opening the file once and continuously writing to it and only close it on program exit (or whenever Main.Inittime changes).
 
Upvote 0

OliverA

Expert
Licensed User
Longtime User
Question (out of curiosity): Why are you not using JSONParser? Then your input would already be a map and the conversion is done once and this may save you from concatenating and splitting the information repeatedly.
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
Question (out of curiosity): Why are you not using JSONParser? Then your input would already be a map and the conversion is done once and this may save you from concatenating and splitting the information repeatedly.

Good question OliverA,

The call to parse json is actually also using jsonparser. It´s been so long since I first needed to parse jsoncode and found this one somewhere on b4x, it´s called "clsJsonTree", that I don´t remember anymore why I ended up using that, but I think it showed to be very flexible in terms of finding any value nomatter how deep it´d be burried into a hierarchy of jsondata as well as not being very picky with the formatting.
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
Maybe only open txwr once per program start (I'm guessing that's what Main.Inittime refers to) and then you would not have to Initialize/close the writer constantly just for writing each line of the file. Currently, for every line, the system opens the file, has to seek to the end of the file, writes to the file and then closes it. Try opening the file once and continuously writing to it and only close it on program exit (or whenever Main.Inittime changes).

The reason for the constant closure is so that it´s easier to open it while the program is running, and also if the program crashes, I´m sure that the file is closed and can be opened for reading.

And yeah, inittime is just a unixtimestamp only set once in appstart.
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
Ok, did it a bit more to figure out how bottlenecked things are.. in the second surrounding the first of the two events, the one which made it to the orderbook list, and not only to the log file, there are 68 events.
For the 2nd one that only made it to the log file, there are 83 events.

What´s more interesting is that the closest event happening to the succesful event is 36 ms away from it, whereas with the one which disappears, there´s an event only 3 ms later. So are these observations indicating that indeed some instructions don´t get executed before the next event is arriving on the websocket for handling?

This is prolly happening frequently - I just decided to focus on only only of the 1000 symbols to have a chance of capturing the problem.
 
Upvote 0

OliverA

Expert
Licensed User
Longtime User
So are you saying every event, processed or not is logged? Are you logging also the raw JSON? It would be interesting to feed the program the "skipped" data and see if it would skip it even under controlled circumstances (then you are looking at some sort of coding issue).
 
Upvote 0

OliverA

Expert
Licensed User
Longtime User
The reason for the constant closure is so that it´s easier to open it while the program is running, and also if the program crashes, I´m sure that the file is closed and can be opened for reading
Then instead of continuously open/closing it, call the Flush method of the TextWriter class (https://www.b4x.com/b4j/help/files.html#textwriter_flush). Even if the program crashes, all data should be logged and the file should be fine (unless the crash happens during writing/before flushing).
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
Then instead of continuously open/closing it, call the Flush method of the TextWriter class (https://www.b4x.com/b4j/help/files.html#textwriter_flush). Even if the program crashes, all data should be logged and the file should be fine (unless the crash happens during writing/before flushing).

Thanks Oliver, did not know about that one so I´ll look into it. I´m also considering changing things a bit for the json parsing. I can see how it´s overkill to use the mentioned separate module here.
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
Ok, now I´ve changed the jsonparsing to simply turning each event into a map instead of it happening for each value in the string over and over again, and I´ve changed the writing to how we spoke about it.
I´ll let my computer record overnight what happens then. What I can see is that now the time for processing is never getting above 2 ms, so there seems to have been a bit of improvement.
I´ll get back with the result!
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
Allright, had a look at the way things turned out during the night. There seems to be none of the data which arrived that didn´t do what they´re supposed to, so that´s a good thing and an indication of that the optimizations have helped.
What I do see though is that now and then there are websocket events which should arrive that are not: They´re not logged, and they´re not processed.. they´re just not there.
How I know that? As mentioned, I´m simultaneously capturing the orderbook on the website of the exchange, which also receives its data via websockets.
Then again, I know that over time, the website also shows "polluted" data, so I guess that there are timeouts on the websockets and such which makes it nearly impossible to guarantee that everything is received.
There´s one thing that could be the reason though: Maybe there´s a rare bug in how I split the events in the socket messages. They often arrive many at a time. So I´ll start logging the full raw incoming message as well to rule out if that´s the case or not.
If not, I´ll set it up to retrieve the full order books every 30 mins or so for all symbols, which takes about 3 minutes. Then the amount of polluted data should be very small.
 
Upvote 0

swChef

Active Member
Licensed User
Longtime User
Can you periodically download transaction logs for the symbols?
Does your app have to track symbols in realtime? You could still use the logs to reconcile or 'sync up'.
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
Can you periodically download transaction logs for the symbols?
Does your app have to track symbols in realtime? You could still use the logs to reconcile or 'sync up'.

What it does is logging all data for later tests. The live view is done to see if it works as it should. But there´s a third functionality that needs the tables to have correct data in real time, so for that I need things to work flawlessly. Otherwise yes, if I could be 100 % sure that all data is written to the log file, it wouldn´t matter so much that the tables are missing data.. this wouldn´t be a problem for the testing based on the logfile.
And transaction logs wouldn´t help me. I need also to have the orderbooks and there´s no place to get historical orderbooks.
 
Last edited:
Upvote 0
Top