Data file optimizing challenge

sorex

Expert
Licensed User
Longtime User
Howdy,

Due to some weirdness some routine didn't work as espected on IOS (sloooooow) so I have to go for a datafile.

A math game with certain rules etc needed some thinkering to get all possible combinations with valid solutions.

This code was ready last friday and ran a few hours to get all data I wanted (10.808.967 lines).

Problem is... it's a 195Mb textfile (24Mb zipped) :)

A bit big to add to my app. ;)

Now comes the challenge...

How far can we optimize this that it is smaller and everything is easy and fast accessible (select a random data line) by only using B4J/B4A/B4i code (only core libs)?

Any data nerds here who are good at optimizing/reorganizing data and like a challenge? :)
PM me and I'll send a link to the file.

I have a few ideas that I will try to work out today and if they work I'll post my current file size.
 

sorex

Expert
Licensed User
Longtime User
v1: 97.280.695 bytes (almost 100Mb shorter)
v1b: 45.938.106 bytes (23.54% left) still way too big
 
Last edited:

wonder

Expert
Licensed User
Longtime User
Is your main goal reducing the file size or optimizing the data access, regardless of size?
 

sorex

Expert
Licensed User
Longtime User
combination of both. I can get it smaller with 7zip for example but it needs too much memory to depack and it doesn't support in-memory depacking either as lookback buffers are too big.

so it's a compromize of size and flexibility (it might be possible to have the data constantly in memory and get your data like that without reading chuncks from the filesystem)

so no unzip to disk, select line, delete unzipped file. (this requires none core libs too)
 

sorex

Expert
Licensed User
Longtime User
right, but you don't want to unzip 195Mb of data to the memory ;)
 

sorex

Expert
Licensed User
Longtime User
I'm not sure but isn't there a 32Mb (heap) limitation or something on android?
 

sorex

Expert
Licensed User
Longtime User
sure, as long as you can randomly get one of these lines out of your datafile then it's fine.
 

sorex

Expert
Licensed User
Longtime User
depends on the case. in this case only valid solutions/puzzles are being spawned to the player.

if you randomly generate the puzzle and then check if it's valid by simulating all possible playing methods this can become really slow
especially if you have the bad luck to pull a few impossible puzzles in a row.

this method prevents this by using the pregenerated valid puzzles.
 

sorex

Expert
Licensed User
Longtime User
v4: 1.405.985 (0.7% of original) and depacking identical to the source

1.391.695 (tiny optimization of the above)
 
Last edited:

udg

Expert
Licensed User
Longtime User
Hi,
I am surely a bit late on this topic, but what kept you from storing the big data on a DB server and download to the device only the chunks needed at some point?
I am sure you considered that options so my question derives mostly from curiosity. Thanks.
 

sorex

Expert
Licensed User
Longtime User
the reason is simple...

people play in offline mode to avoid ads being displayed.

if you store something to bypass this offline problem you can just store it all ;)
 

sorex

Expert
Licensed User
Longtime User
v5: 1.386.826 (0.71%)

141.38 times smaller than the original.

That's probably the best I can do.

surprisingly the data file still compresses with zip to 638.663 bytes so the overhead is really minimum in the apk.
 
Last edited:

ac9ts

Active Member
Licensed User
Longtime User
the reason is simple...

people play in offline mode to avoid ads being displayed.

if you store something to bypass this offline problem you can just store it all ;)

How are you going to get paid if they run your app off-line? Put the file in the cloud and force them to play-online ;)
 

sorex

Expert
Licensed User
Longtime User
rely on those who don't play offline and don't use adblocking apps (or build in the OS these days) ?

if you block them they'll write bad reviews and/or move on to an alternative that keeps working offline.
 
Top