Android Question Max items of a List - list alternatives?

kostefar

Active Member
Licensed User
Longtime User
Dear All,

I´m using b4x to do some calculations involving setting up a list with around 6 mio. items, which seems to be more than a normal list can hold.
The data is loaded line by line from a txt file, where 5 elements are put into a map, which is then added to the list.
This is prolly something I´d be better off at doing in vb6, as it´s not app related at all, and vb6 won´t suffer from the kind of memory limitations that we have in android, plus it´d prolly be faster too.
However, it´s been years since I touched vb6, so I first thought I´d see if there´s a way for me to do this in b4a.
After 1 mio lines, I get java.lang.OutOfMemoryError: OutOfMemoryError thrown while trying to throw OutOfMemoryError; no stack trace available.
I already have
B4X:
SetApplicationAttribute(android:largeHeap,"true")
set in the manifest.

Alternatively I could perform the calculations while loading the lines from file, not adding them to a list, but I assume it´ll be superslow then as they´re not loaded into memory first.
 

kostefar

Active Member
Licensed User
Longtime User
You could sort all the items alphabetically and then insert them into 26 separate lists (A to Z), plus maybe a 27th list for numerical items. Searching something is then done on the basis of selecting the list that corresponds with the first letter of the search phrase (A to Z). Searching will also be much quicker then. I have done this trick already a number of times.


Good answer, except for that the data we´re talking about here are historical ticks for a stock quote so not really relevant with alphabetical sorting.
 
Upvote 0

stevel05

Expert
Licensed User
Longtime User
If you don't need to access the lists, do the calculations per line as read from the file and only store what you actually need. It shouldn't be any slower.

If you are thinking of using something different, try B4j first.
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
If you don't need to access the lists, do the calculations per line as read from the file and only store what you actually need. It shouldn't be any slower.

If you are thinking of using something different, try B4j first.

Thanks, the thing is that it´ll run over the data multiple times, so doing this by reading from disk rather than from memory I´d assume to be more slow, right?
B4j is a good idea indeed since it won´t need to run on an emulator - that´s your point isn´t it?
 
Upvote 0

RandomCoder

Well-Known Member
Licensed User
Longtime User
When you're looking at this many records you really should be considering how you might store the data a database. Databases are designed to hold vast amounts of data and have functions to allow extremely fast extraction of the values you require.
 
Upvote 0

stevel05

Expert
Licensed User
Longtime User
If you are running it on an emulator, how much RAM have you set for the device?

6 mlo (Million?) items with 5 attributes is a fairly large chunk of memory to be assigning. You need to make sure the device has enough memory, if it's an emulator, then make sure you have assigned enough. And yes, if it is running over the data multiple times it will be slower, the emulator is slow anyway.

that´s your point isn´t it?
it wasn't as you didn't say you were running on an emulator, but I'm sure it would help
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
If you are running it on an emulator, how much RAM have you set for the device?

6 mlo (Million?) items with 5 attributes is a fairly large chunk of memory to be assigning. You need to make sure the device has enough memory, if it's an emulator, then make sure you have assigned enough. And yes, if it is running over the data multiple times it will be slower, the emulator is slow anyway.

it wasn't as you didn't say you were running on an emulator, but I'm sure it would help


But isn´t there a ceiling for the amount of items that the list object can hold regardless of the memory set for the device?
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
When you're looking at this many records you really should be considering how you might store the data a database. Databases are designed to hold vast amounts of data and have functions to allow extremely fast extraction of the values you require.

I just need to run a simulation starting with the first item going to the last item, so very 1 dimensional. Not sure if it´d really help with a database?
 
Upvote 0

RandomCoder

Well-Known Member
Licensed User
Longtime User
I just need to run a simulation starting with the first item going to the last item, so very 1 dimensional. Not sure if it´d really help with a database?
It really depends what you are doing with the values?
If you are plotting each data point on a graph then probably not any faster using a database, but if you're summing the values, finding the average, counting the number of data points above a certain threshold etc etc, then databases would be miles FASTER!
It depends on what you intend to do with the values?
 
Upvote 0

OliverA

Expert
Licensed User
Longtime User
Alternatively I could perform the calculations while loading the lines from file, not adding them to a list, but I assume it´ll be superslow then as they´re not loaded into memory first.
How about "chunking" the read? Read x amount of data, do you calculations, read x amount more, do some more calculation. That's how it was done in the good old days of memory constraint systems. Depending on the calculations performed, could use the SQLite DB (ninja'd by @kostefar as I'm typing this) to let you handle the calculations.
 
Upvote 0

stevel05

Expert
Licensed User
Longtime User
But isn´t there a ceiling for the amount of items that the list object can hold regardless of the memory set for the device?

Apparently not that would cause a problem like this, the theoretical max is the maximum value of an integer,
 
Upvote 0

OliverA

Expert
Licensed User
Longtime User
The data is loaded line by line from a txt file, where 5 elements are put into a map, which is then added to the list.
Alternatively I could perform the calculations while loading the lines from file, not adding them to a list, but I assume it´ll be superslow then as they´re not loaded into memory first.
After re-reading your initial post, I think you need to try and see if calculating while loading really has an impact on the time it takes to finish, especially if you can skip loading your data into a list. You're already looping over the data, since you (as you said) are loading the data line by line. So instead of creating a map (which takes time) and then loading that into a list (which takes time), why not just perform your calculations? The only speedup over that is to read the file in chunks then, since file reads (even from flash memory) may be way slower than looping over data that is in "regular" memory of the device. This "speedup" may not be noticeable in an emulator though.
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
Apparently not that would cause a problem like this, the theoretical max is the maximum value of an integer,

So I set up the base memory of genymotion to 8 GB (!) from 3 GB. The results are no different though. The dataset the runs from 2011 to 2017 runs out of memory when using list at some point in 2012, after about a million entries.
If I define an array of maps with space for 8000000 elements, I get an out of mem error even in 2011.
If I read the data without adding them to anything, I get "read: unexpected EOF!" in November 2016. I got rid of that error before by truncating datasets, so it´s indeed memory related.
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
It really depends what you are doing with the values?
If you are plotting each data point on a graph then probably not any faster using a database, but if you're summing the values, finding the average, counting the number of data points above a certain threshold etc etc, then databases would be miles FASTER!
It depends on what you intend to do with the values?

Something like: Remember this price untill a lower price is showing up to replace it. Close deal whenever price goes x % above lowest price. Aka trailing stop.
 
Upvote 0

OliverA

Expert
Licensed User
Longtime User
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
How about "chunking" the read? Read x amount of data, do you calculations, read x amount more, do some more calculation. That's how it was done in the good old days of memory constraint systems. Depending on the calculations performed, could use the SQLite DB (ninja'd by @kostefar as I'm typing this) to let you handle the calculations.

At the current point, I cannot even read to the end of the file, where nothing gets added to any list, without an error after about 85 % of the file has been read.
 
Upvote 0

kostefar

Active Member
Licensed User
Longtime User
After re-reading your initial post, I think you need to try and see if calculating while loading really has an impact on the time it takes to finish, especially if you can skip loading your data into a list. You're already looping over the data, since you (as you said) are loading the data line by line. So instead of creating a map (which takes time) and then loading that into a list (which takes time), why not just perform your calculations? The only speedup over that is to read the file in chunks then, since file reads (even from flash memory) may be way slower than looping over data that is in "regular" memory of the device. This "speedup" may not be noticeable in an emulator though.

I see what you´re saying, but how about that I may need to loop over the data 100 times afterwards? Would it not be more efficient then, to have the data in memory?
 
Upvote 0

OliverA

Expert
Licensed User
Longtime User
I see what you´re saying, but how about that I may need to loop over the data 100 times afterwards? Would it not be more efficient then, to have the data in memory?
Your changing the goal posts on me! :eek::D Yes, then it would be more efficient. But that means squat if the memory capacity is exceeded. You then either find a method of dealing with the data in chunks (maybe write a prepped version of the data back to a file) or you find a library that can deal with stuff like that.
 
Upvote 0

OliverA

Expert
Licensed User
Longtime User
Upvote 0
Top