Android Question Max items of a List - list alternatives?

kostefar

Active Member
Licensed User
Longtime User
Dear All,

I´m using b4x to do some calculations involving setting up a list with around 6 mio. items, which seems to be more than a normal list can hold.
The data is loaded line by line from a txt file, where 5 elements are put into a map, which is then added to the list.
This is prolly something I´d be better off at doing in vb6, as it´s not app related at all, and vb6 won´t suffer from the kind of memory limitations that we have in android, plus it´d prolly be faster too.
However, it´s been years since I touched vb6, so I first thought I´d see if there´s a way for me to do this in b4a.
After 1 mio lines, I get java.lang.OutOfMemoryError: OutOfMemoryError thrown while trying to throw OutOfMemoryError; no stack trace available.
I already have
B4X:
SetApplicationAttribute(android:largeHeap,"true")
set in the manifest.

Alternatively I could perform the calculations while loading the lines from file, not adding them to a list, but I assume it´ll be superslow then as they´re not loaded into memory first.
 

kostefar

Active Member
Licensed User
Longtime User
I can understand what you mean but the same device can do magic things if you use a SQLite DB.

What I would do:

1. Use B4J (I've heard this is a good tool) to read the file and store it into a SQlite DB)
2. Use the Batch method to insert the rows
3. After that it is just a file copy to your device (e.g. via Google Drive or so). Do NOT include such a big file in your assets.

For test reasons I've put 10 mio. rows in a Sqlite DB which took just 30-60 secs or so (B4J, WIN 10 64 Bit, 12 GB RAM, SSD). The size (depending on the data) was about 400 MB which isn't that big.

Just wondering: What is "the batch method"?
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
Ok, got the sql stuff working in both b4a and b4j, right now converting the data to db which will take an estimate of 66 hours with b4a. It only works for me in b4j if I use debug mode, but then it´s even more slow. In release mode it does not do anything.

Any idea what to do with this?

I have

B4X:
#Region Project Attributes
    #MainFormWidth: 600
    #MainFormHeight: 600
#End Region
#AdditionalJar: sqlite-jdbc-3.7.2

And set it up like this:

B4X:
sql.InitializeSQLite("K:\ticks","b4jticks.db",True)
    sql.ExecNonQuery("CREATE TABLE table1 (Date TEXT , Time TEXT, Price TEXT, Bid TEXT, Ask Text)")

Adding the values with:

B4X:
sql.ExecNonQuery2("INSERT INTO table1 VALUES (?,?,?,?,?)", Array As String(m.Get("Date"),m.Get("Time"), m.Get("Price"),m.Get("Bid"), m.Get("Ask")))
 
Upvote 0

OliverA

Expert
Licensed User
Longtime User
Where you are adding that many records (6 mil?) to the db, the batch methods may not work (it uses a list internally. If you're curious look at the library source https://github.com/AnywhereSoftware/B4J_SQL/blob/master/src/anywheresoftware/b4j/objects/SQL.java), but at minimum you need to wrap your ExecNonQuery2 with a BeginTransaction and TransactionSuccessful methods. Pseudo code:

B4X:
sql.BeginTransaction
Try
  ' Here is the loop that adds you data via sql.ExecNonQuery2
  ' End of loop
  sql.TransactionSuccessful
Catch
  ' Something bad happened, lets rollback our db changes
  sql.Rollback
End Try

That change alone may cut your import time from hours to minutes.
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
Where you are adding that many records (6 mil?) to the db, the batch methods may not work (it uses a list internally. If you're curious look at the library source https://github.com/AnywhereSoftware/B4J_SQL/blob/master/src/anywheresoftware/b4j/objects/SQL.java), but at minimum you need to wrap your ExecNonQuery2 with a BeginTransaction and TransactionSuccessful methods. Pseudo code:

B4X:
sql.BeginTransaction
Try
  ' Here is the loop that adds you data via sql.ExecNonQuery2
  ' End of loop
  sql.TransactionSuccessful
Catch
  ' Something bad happened, lets rollback our db changes
  sql.Rollback
End Try

That change alone may cut your import time from hours to minutes.


Thanks OliverA,

I added this but it does not change the fact that nothing gets written to the outputfile, unless I run in debug mode.
It also does not show a line I have before all the sql stuff is initialized and started in the log:

B4X:
Log (DateTime.Time(DateTime.Now))

This one shows when I use debug mode.

EDIT: Oops, this one actually runs really fast in debug mode! About a min to compile the whole list :) Awesome!!!

Any idea why it won´t work in release mode?
 
Last edited:
Upvote 0

rboeck

Well-Known Member
Licensed User
Longtime User
@kostefar: I have two questions: Do you use a 64 bit Java or a 32 bit - if 32bit than i think that the biggest array size is maybe 2 gb - but i am not sure about this; but i am interested to try it by myself: You could eventually distort your data in such a way, that more of us could download and try to handle arrays of this size
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
@kostefar: I have two questions: Do you use a 64 bit Java or a 32 bit - if 32bit than i think that the biggest array size is maybe 2 gb - but i am not sure about this; but i am interested to try it by myself: You could eventually distort your data in such a way, that more of us could download and try to handle arrays of this size

It looks like it´s 32-bit. Moved it a long time ago, so had to inspect the exe.

For now it´s all good and I´m gonna step forward to reading back the data into sqlite and do the simulation.
If you want a copy of the data, just let me know and I´ll pm you with a link to where you can download it zipped.
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
Would you be willing to share your B4A solution with us? Ofcourse without your database. I am curious how sql db's could handle such large databases.

Hi Syd,

Sure. This is the code I´ve used to convert the data:

B4X:
Sub converttodb
    Dim textrd As TextReader
    textrd.Initialize(File.OpenInput("K:\ticks","ticks.csv"))
    Dim line As String
  
    Log (DateTime.Time(DateTime.Now))
    line = textrd.ReadLine
    If File.Exists("K:\ticks","b4jticks.db") Then File.Delete ("K:\ticks","b4jticks.db")
    Dim counter As Int = 0
    sql.InitializeSQLite("K:\ticks","b4jticks.db",True)
    sql.ExecNonQuery("CREATE TABLE table1 (Date TEXT , Time TEXT, Price TEXT, Bid TEXT, Ask Text)")
    sql.BeginTransaction
  
    Try
        Do While line <> Null

            Dim m As Map = (parser(line))
            counter = counter + 1
            sql.ExecNonQuery2("INSERT INTO table1 VALUES (?,?,?,?,?)", Array As String(m.Get("Date"),m.Get("Time"), m.Get("Price"),m.Get("Bid"), m.Get("Ask")))
            line = textrd.ReadLine
        Loop
        sql.TransactionSuccessful
    Catch
        sql.Rollback
    End Try
    Log (DateTime.Time(DateTime.Now))
    Log (counter)
End Sub

B4X:
Sub parser(line As String) As Map
   
    Dim counter As Int = 0
    Dim m As Map
    m.Initialize
    Do While line.Contains(";")
        Dim columnval As String = line.SubString2(0,line.IndexOf(";"))
        If counter  = 0 Then
            columnval = columnval.SubString(4) & columnval.SubString2(0,4)
            m.Put ("Date",columnval)
        End If
        If counter  = 1 Then m.put ("Time",columnval)
        If counter  = 2 Then m.Put ("Price",columnval)
        If counter  = 3 Then m.Put ("Bid",columnval)
        line = line.SubString(line.IndexOf(";")+1)
        counter = counter + 1
    Loop
   
    m.Put ("Ask",(columnval) )
    Return m
       
       

End Sub

The parser is not very elegant (neither is the rest I guess); I originally wanted to use regex.split for this, but for some reason it was throwing an error when iterating over the array containing the data that´s been split. This did not happen if I inserted a msgbox before the assignment of the value to a map key, so I tried with a bunch of DoEvents to see if that would help, but still the same. That´s why it looks the way it does.
 
Upvote 0

rboeck

Well-Known Member
Licensed User
Longtime User
Your main problem is your csv file - here a demo:
01042011;09:39:24;2944;3409;3923
[CR][LF]
[CR]
01042011;09:39:24;4915;3909;3923
[CR][LF]
[CR]
....
Only each third line was used for data, so you have to check for line.Length!

Here my results:

B4X:
Sub InsertInList
 
    Dim textrd As TextReader
    textrd.Initialize(File.OpenInput("K:\ticks","ticks.csv)
    Dim line As String
    Dim List1 As List
 
    List1.Initialize

    Log (DateTime.Time(DateTime.Now))
 
    line = textrd.ReadLine
 
    Dim counter As Int = 0

    Do While line <> Null
     
        If line.Length>5 Then    'only each third line has really data - thats why regex didnt work..
         
            Dim m1() As String
            m1=Regex.Split(";",line)
         
            m1(0) = m1(0).SubString(4) & m1(0).SubString2(0,4)

            List1.Add(m1)
         
            counter = counter + 1
        End If
     
        line = textrd.ReadLine
    Loop
 
    Log (DateTime.Time(DateTime.Now))
    Log (counter)
 
End Sub

Time needed for filling 4.178.888 records to list: 13 seconds, i could not see any memory problems, next step was to repeat it two times,
there was no memory problem with now more then 12 million records in the list; time needed: 21 seconds.

my new variant of converttodb
B4X:
Sub converttodb
 
    Dim textrd As TextReader
    textrd.Initialize(File.OpenInput("Z:\","ticks.csv"))
    Dim line As String

    Log (DateTime.Time(DateTime.Now))
    line = textrd.ReadLine
    If File.Exists("Z:\ticks","b4jticks.db") Then File.Delete ("Z:\ticks","b4jticks.db")
 
    Dim counter As Int = 0
    Dim C2 As Int =0
    SQL.ExecNonQuery("CREATE TABLE table1 (Date TEXT , Time TEXT, Price TEXT, Bid TEXT, Ask Text)")

    Do While line <> Null
     
        If line.Length>5 Then
                           
            Dim m1() As String
            m1=Regex.Split(";",line)
         
            m1(0) = m1(0).SubString(4) & m1(0).SubString2(0,4)
         
            counter = counter + 1
            C2=C2+1
         
            SQL.AddNonQueryToBatch("INSERT INTO table1 VALUES (?,?,?,?,?)", m1)
        End If
     
        If C2=1000000 Then
            Dim SenderFilter As Object = SQL.ExecNonQueryBatch("SQL")
            Wait For (SenderFilter) SQL_NonQueryComplete (Success As Boolean)
            C2=0
            Log(counter)
        End If
     
        line = textrd.ReadLine
    Loop
 
    Dim SenderFilter As Object = SQL.ExecNonQueryBatch("SQL")
    Wait For (SenderFilter) SQL_NonQueryComplete (Success As Boolean)
 
 
    Log (DateTime.Time(DateTime.Now))
    Log (counter)
 
    SQL.Close
 
End Sub

I used for the first time addnonQuerytoBatch with Wait for...
In release mode it was ready in 47 seconds, the file size is 186.981 kb
Every 1 million records i used SQL.ExecNonQueryBatch, and there was no real time difference if i used it every 100.000 records.
Only when i tried to insert all records at once java had problems, used my cpu to 99% for minutes, but didnt crash! But the time was four times higher than usual.
I tried to simplify as much as possible and integrated the parsing and removed the use of maps.
So i hope i could clear the problem.
P.S: My son used c++ to read the file in array: 5 sec against 13 sec. in java.
 
Last edited:
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
Thanks rboeck, that´s an impressive amount of work that you put into that!
I did notice the blank lines later, but did not think about that they´d be the ones causing the regex error - doh!
Still, if there are 4 million records, why would I get the memoryerror after 1 mio records? What´s your MaxRamForDex value? I can´t get mine up to even 850 - 810 helped me to read through the file.
I´m happy to have gotten into sql under b4x and b4j - and seeing what b4j can do with speed. So, my luck that I had this memory error (and the help from you guys), or I´d never have digged into this :)

Btw, which device are you using for b4a? Emulator or hardware?
 
Last edited:
Upvote 0

OliverA

Expert
Licensed User
Longtime User
I used for the first time addnonQuerytoBatch with Wait for...
AddNonQueryToBatch is a great tool, but I wonder if it is necessary here (unless some sort of UI responsiveness is needed). The method adds the statement to an internal list and then ExecNonQueryBatch iterates over this list by calling ExecNonQuery2 for each statement in the list. This procedure is surrounded in a Try/Catch block and uses BeginTransaction/TransactionSuccessful/Rollback as necessary. Therefore by using the Try/Catch and ExecNonQuery2 directly (with the BeginTransaction/TransactionSuccessful/Rollback methods - see post #46 in this thread), you would take out one more loop (used in ExecNonQueryBatch) and one more list eating up memory (in AddNonQueryToBatch) that may help speed up/simplify the code.

Link to source for AddNonQueryToBatch: https://github.com/AnywhereSoftware...rc/anywheresoftware/b4j/objects/SQL.java#L223
Link to source for ExecNonQueryBatch: https://github.com/AnywhereSoftware...rc/anywheresoftware/b4j/objects/SQL.java#L240
 
Upvote 0

rboeck

Well-Known Member
Licensed User
Longtime User
@kostefar I have only a 64 bit java installed, currently i have no extra parameters set - i removed your map construction, because it was my feeling, that your more complex use of maps could be the source of the memory problem - try my construction and look, if your memory error is away. The more complex data types make some joy to work with, but in case of millions of them the internal footprint in memory is also more complex; maybe i am wrong here.
Currently i didnt test it with android, because i think the converting is a single step task and you can use the readymade database in your android device without problems. From older test i calculate, that my desktop is three times faster in database operations then my android devices.
The amount of work was not as big as it looks, my motivation was to clear the memory problematic - i dont want to come in this situation in my real world problems, so with this test program i wanted to prepare me against this problem. The sub InsertInList was made as copy and paste of the database code, i only inserted some lines for the list and removed all database specific code and was ready in five minutes.
Today i will try to set some xms.. parameters and see, if i can make it with only one use of ExecNonQueryBatch

BTW: I just found a message, that i had in memoy, when reading your posts:https://www.b4x.com/android/forum/threads/parsing-huge-text-files.34923/#content
Look in the last line: for 32 bit systems the usable memory is limited to 2 Gb.

Now i could handle it in one step with this line:
#VirtualMachineArgs: -Xms3072m -Xmx3072m
But the time used overall is now 52 sec. against 47 sec. with immediated use of ExecNonQueryBatch.
 
Last edited:
Upvote 0

rboeck

Well-Known Member
Licensed User
Longtime User
@OliverA: You hit the point exactly - i had in mind, that the batch operations could be faster, because of the use of another thread for the database operation. But every operation has its price - i worked my sample back to ExecNonQuery2, included the transaction by myself, and got the result of 40 sec. and a more stable code without need for any memory settings. So batch operations are sometimes a good alternative, but not for this sample.

Next "tuning" step: I changed to a newer sqlite engine:
#AdditionalJar: sqlite-jdbc-3.20.0
Instead of 40 sec i get 34 sec!
 
Last edited:
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
Thanks OliverA and rboeck, you guys really know your stuff. I guess I could benefit from installing a 64-bit java - prolly been years since I set it up with this version, following an old b4a guide.
 
Upvote 0

rboeck

Well-Known Member
Licensed User
Longtime User
Last question: you wrote '.. been years since' ; your profile shows 13 years?! What is true; did you start with programming with 8 years?
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
Last question: you wrote '.. been years since' ; your profile shows 13 years?! What is true; did you start with programming with 8 years?

Good one!
I just put a random age there when I made my profile.
It´s prolly just 3 years since I installed b4a, so years yeah - but not alot of them.
 
Upvote 0
Top