B4J Question Best way to get 15K integers from spreadsheet to B4J

SeaBee

Member
Licensed User
I have computed with Excel about 15K integers which are polynomial arguments, and which need to be entered in an array, list, table, type or whatever. They must all be available for calculations as soon as the program starts. They will be broken down into multiple subsets to simplify their use, but I do not want them in an external file or database to be loaded on program start. They must be internal to the program itself.

It will be a trivial matter to export the data to a comma delimited or tab delimited file, but I have no idea of how to handle the data in the most efficient and fast way in B4J. I don't mind reformatting the raw data to get it loaded as I can do that with macros - but I really don't want to retype it all!

I would be most grateful if someone more experienced than I with B4X could provide some suggestions or pointers.

Thanks very much!
 

Philip Chatzigeorgiadis

Active Member
Licensed User
Longtime User
The issue is not just data security - it is also how to get the data into the program, but you are right about the requirements for obfuscation. I enjoy writing obfuscation algorithms, and some of the are very mean indeed. What would you make of the entirety of Hamlet's soliloquy imbedded in a string, which was then read into meaningless substrings at various places? :D
Salty and elegant!
 
Upvote 0

SeaBee

Member
Licensed User
With experience like that, this should be a relaxing stroll in the park.

The byte storage for arrays is efficient, it's the initializing byte arrays using Array As Byte(data) that takes more of the limited 64kB-per-method code space than expected, eg, these lines in B4A:

[Code removed for readability]

which I don't have a problem with as long as the BASIC code works like it should, and the Java Virtual Machine Just-In_Time compiler seems to be doing some mighty impressive optimisation - it regularly has my eyebrows shooting up in surprise. But the (doubles) make me think that it is using 8 bytes of code space just to store a single byte-sized number. Normally not a problem, because memory and disk space are bordering on free and unlimited nowadays. Plus the code gets compressed for distribution. And besides, what kind of a nutcase would write a method that gets even halfway to 64 kB anyway? ;-)


If that worked before, then use that. Avoid the first 32 characters (control characters) and perhaps character 127 (delete), but that still leaves you 223 character values to use for representing the 201 coefficient values you need. Heaps easy! Perhaps even skip character 32 (space) also, so that you don't need worry about spurious trailing spaces screwing up your data alignment, and you could also then use spaces to separate the polynomial data for visual confirmation.

That, on the face of it, appears verbose in the extreme, but, as you say, it rather depends on what the JIT does to it.

It would also appear that all-positive numbers, and very long byte arrays, can be stored more efficiently. It makes me wonder if it would be worthwhile to add 100 to all the arguments, and then create several large array of byte arrays with the x and y reversed. Probably only make a marginal saving, though, in return for more complex code, so probably not worth it. I guess it's the price of using a basic to java translator/compiler. My problem is that I have written so much stuff in various flavours of basic, that I can code much faster than in other less verbose languages like C#.

Anyway, I hate squirly brackets!

I shall now build a data formatting program which will use your method of extraction from Excel to build the array of byte arrays, then convert to ascii, and then obfuscate into an external file using something less extreme than Hamlet - maybe the opening paragraph of Alan Turing's 'On computable numbers, with an application to the Enscheidungsproblem' of 1936. :eek::D
 
Upvote 0

emexes

Expert
Licensed User
wonder if it would be worthwhile to add 100 to all the arguments
That ups the limit to 596 polynomials x 6 coefficients ~= 3500 coefficients total, per method. Tested working with five sets of 596 polynomials ~= 17000 coefficients total. So it can be done, but might need some shoehorning.

That might be enough of a reason to flip to the external file solution.

The byte-coefficients-in-one-big-string is still a plausible option. If the string did ever grow > 64 kB (ie fourfold) then it could be split amongst methods and/or code files.

It'd be useful to have a more detailed description than just "15k integers", like how large are the polynomials, does that size vary, how many are there, what are these subsets, etc. But presumably there must be limits to what you can reveal, so... no worries, we'll forge on in darkness :)
 
Upvote 0

SeaBee

Member
Licensed User
Haven't ready all posts, however you should never put the data inside the code. From all the possible ways to choose this is the worst option.

It is trivial to load a CSV file, a JSON file, a database file, a text file, a B4XSerialized file or any other way you like.
I came to that conclusion in my posts #15 and #22, purely for practical reasons - mostly size and performance related. These numbers are actually all constants, and I already have over a thousand other constants already declared as such in the code. The difference is that the internal constants will be used thousand of times for each program run, whereas these only a few.

As a matter of interest, as my degree is in mathematics, not computer science, why is it considered so bad to have data inside the code? The data cannot be changed under any circumstances unless the whole edifice is changed - which is why it has taken me so long to get this far.
 
Upvote 0

SeaBee

Member
Licensed User
That ups the limit to 596 polynomials x 6 coefficients ~= 3500 coefficients total, per method. Tested working with five sets of 596 polynomials ~= 17000 coefficients total. So it can be done, but might need some shoehorning.

That might be enough of a reason to flip to the external file solution.

The byte-coefficients-in-one-big-string is still a plausible option. If the string did ever grow > 64 kB (ie fourfold) then it could be split amongst methods and/or code files.

It'd be useful to have a more detailed description than just "15k integers", like how large are the polynomials, does that size vary, how many are there, what are these subsets, etc. But presumably there must be limits to what you can reveal, so... no worries, we'll forge on in darkness :)
Sorry - I didn't make myself clear. I meant adding 100 to each argument so all arguments are positive. This would of course mean that they would have to be Int values, as they would exceed the +127 limit.

Now I have decided that I will have to use an external file, it becomes irrelevant.
 
Upvote 0

emexes

Expert
Licensed User
Sorry - I didn't make myself clear. I meant adding 100 to each argument so all arguments are positive. This would of course mean that they would have to be Int values, as they would exceed the +127 limit.
Understood. B4A isn't doing type value range checking here, and it all gets masked to fit in 8 bits. ALthough I didn't actually check the intermediate Java that the >127 values weren't causing additional code to be generated, as with the Double casts.

Not that it matters. End result is that it did not much improve the initialization code-byte-per-data-byte ratio.
 
Last edited:
Upvote 0

SeaBee

Member
Licensed User
Understood. B4A isn't doing type value range checking here, and it all gets masked to fit in 8 bits. ALthough I didn't actually check the intermediate Java that the >127 values weren't causing additional code to be generated, as with the Double casts.

Not that it matters. End result is that it did not much improve the initialization code-byte-per-data-byte ratio.

Yesterday I had to do real work i.e. for real money, so I didn't get much further forward. I formatted a small subset of the arguments in Excel and dumped them into a B4J program that ultimately created a series of character strings (UTF-8) in an external file, as I did for the old HP75C. When doing this, I realized I had another problem.

HP did not use two's complement for bytes - they had a completely positive range from 0 to 255, with printable characters for each number. With java, I just have values 33 through 127.

My next step is, therefore, to revisit my original polynomials and remove some of my optimizations, which will give me more polynomials but with smaller arguments. As the polynomials are additive, it will be a comparatively straight forward but time consuming operation. Maths is what I do, anyway - or 'math' if you're an American! :D
 
Upvote 0

SeaBee

Member
Licensed User
Map signed coeff to 35..253 then multiply by 128/127 to skip over 127

No need - there were only about a hundred polynomials where I had to remove earlier optimization - I had optimized to get any particular group to less than 100 polynomials. By going back to the previous versions for the large valued arguments I have reduced the range to -35 >> + 57, giving me a spread of 93 - just one to spare! I have a couple of hundred extra polynomials, and a lot more arguments, but hey - who's counting! :D
 
Upvote 0

emexes

Expert
Licensed User
No need - there were only about a hundred polynomials where I had to remove earlier optimization - I had optimized to get any particular group to less than 100 polynomials. By going back to the previous versions for the large valued arguments I have reduced the range to -35 >> + 57, giving me a spread of 93 - just one to spare! I have a couple of hundred extra polynomials, and a lot more arguments, but hey - who's counting! :D
I don't know about you, but these numeric discussions make me dizzy (with confusion, not excitement ;-)

Post #30 brevity due to being written at red traffic light on way home.

If you use UTF-8 then you can have character values up to 21 bits wide. Characters 0..127 are encoded as one byte ie 1:1, and characters 128..2047 are encoded using 2 bytes. There is already a String-to-Char-array function in B4X, and I'm pretty sure there will be an Asc() function to convert those Chars to Ints.

Depending on the risk in using new polynomials with smaller coefficients, it might be better to go back to using the original tried-and-trusted versions. Or maybe you're so close to the finish line that it's simpler to complete the track you're on.

Are all the polynomials of the same degree, like: is it an array of 3000 polynomials x 5 coefficients each, or are there significant variations to the polynomial degrees?
 
Upvote 0

SeaBee

Member
Licensed User
It's all done! I now have 8 separate files containing all the arguments encoded as single (8 bit) UTF-8 characters, totalling a little over 16kb in content, due to the additional polynomials and arguments from the de-optimization. The size on disk is less than 20kb. RESULT!

Incidentally, the reason I am doing this in B4J is that ultimately I want the app to run on an Android tablet as well as a laptop, so B4J is an obvious pathway. Had I used VB.NET, which I have been using since version 1, life would have been a lot easier. In fact, a lot of the maths involved was done in VB.NET, and I have already transferred some of the code to B4J and tested it.

Anyway, thanks for all your help and encouragement - between us we have got the job done!
 
Upvote 0

emexes

Expert
Licensed User
Just a thought - call me Nervous Neddie if you like - but if it was me, I'd perhaps write a quick Sub that dumps the final coefficient arrays to the log, copy all lines, paste to Excel, data convert to columns, then compare them against the source just to make sure nothing got lost in the trip.

I have found it to be a Law of Nature that stuff I check is Always Right, and stuff I don't check is Always Wrong. Go figure :-/

But... time lost on the former is nothing compared to time lost on the latter.

;-)
 
Upvote 0

SeaBee

Member
Licensed User
Just a thought - call me Nervous Neddie if you like - but if it was me, I'd perhaps write a quick Sub that dumps the final coefficient arrays to the log, copy all lines, paste to Excel, data convert to columns, then compare them against the source just to make sure nothing got lost in the trip.

I have found it to be a Law of Nature that stuff I check is Always Right, and stuff I don't check is Always Wrong. Go figure :-/

But... time lost on the former is nothing compared to time lost on the latter.

;-)

Tomorrow I start to write the routine to suck the characters out of the files and put them into a binary array. I shall read the characters a row at a time, and each file can be logically split into two components to keep the Sub size down.

I have actually done validity checks at each stage. so I am fairly confident the data is correct, but I shall match the input array to the output array. I also have some specific results calculated on a mainframe to 16 significant figures (using different algorithms). If my numbers match to 12, I shall have reached the required accuracy.
 
Upvote 0

emexes

Expert
Licensed User
I shall read the characters a row at a time
Or, if the polynomials are of fixed length, perhaps just read the entire file into a single string, use .Replace() to delete all the spaces and Char(13)s and Char(10s), then step through the string eg if all polynomials have six terms:
B4X:
BigString = BigString.Replace(" ", "").Replace(Char(13),"").Replace(Char(10),"")
Log(BigString.Length)

Dim ExpectedNumPolys As Int = BigString.Length / 6

Dim NumPolys As Int = 0
For I = 0 To BigString.Length - 6 step 6
    PolyWaffle(NumPolys) = Array As Byte( _
        CharToCoefficient(BigString.CharAt(I + 5)), _
        CharToCoefficient(BigString.CharAt(I + 4)), _
        CharToCoefficient(BigString.CharAt(I + 3)), _
        CharToCoefficient(BigString.CharAt(I + 2)), _
        CharToCoefficient(BigString.CharAt(I + 1)), _
        CharToCoefficient(BigString.CharAt(I + 0)) _
    )
    NumPolys = NumPolys + 1
}
 
Upvote 0
Top