B4J Question (SOLVED) How do you address columns in an Excel worksheet with XLReader without knowing the exact structure of the worksheet?

BlueVision

Active Member
Licensed User
Longtime User
Hello friends,
it is primarily about reading large excel files with multiple worksheets in conjunction with the XL reader. I have tried a lot now, but I always come up against limits when several columns have to be addressed and I do not know exactly the arrangement of the information in the respective worksheet, or the programme is supposed to find the data itself without me knowing the correct column beforehand.
This is a bit abstract now, because I cannot serve with a program code.

Example:
There is a worksheet that stores the information of the 5 largest cities for several countries in the world. In column B is the capital of the first country, followed by 4 more cities in columns C, D, E and F. The capital of the next country can then be found in column G.
In the rows of the respective worksheets, hundreds of code numbers (number of inhabitants, area...) then follow one below the other.
Assuming that the cities are sorted alphabetically and that new cities are always added to the table, I have to implement dynamic addressing without knowing the detailed structure of the Excel worksheet. I have to search for the city within the columns and compare it with a search argument.
As I said, this is just an abstract example to illustrate my problem.

My biggest problem is that unfortunately I cannot now understand the necessary functions of the XLReader using the examples provided by Erel, how I could now specifically search for a city, these are unfortunately a bit too much oriented towards B4XTable examples. That's just as well. But I would like to export the data into an SQL table first, in order to be able to load it from this SQL table into a B4X table later.

Does anyone have some sample code or instructions for me on how to solve this problem with XLReader?

How does one get around the vexed problem of "letter addressing" the columns in an Excel spreadsheet? Is there a mode that can also address the columns numerically?

Many thanks in advance to all the people who are helping me to solve this problem.
 

walt61

Active Member
Licensed User
Longtime User
Hi @BlueVision , see attached demo project. It doesn't use column letters at all, and copies a specified worksheet to an SQLite table (I tweaked the code that I'm using in a project of mine). Have a look and shout if something's unclear! (I'll try to explain then - if I know the answer, that is)

Disclaimer: perhaps some things could be done more intelligently so if anyone spots something feel free to let me know :)
 

Attachments

  • XLtoSQLiteDemo.zip
    8.7 KB · Views: 74
Upvote 0

BlueVision

Active Member
Licensed User
Longtime User
Thank you very much!
Very complex for an example programme, it will take me a while to understand the code. But it's also not a simple topic. I have worked a lot with the jExcel library under Android, but with B4J you can also read XLSX or XLSM files thanks to jPOI and XLReader. But the differences are enormous.
How long did you work on this programme? Weeks?
 
Upvote 0

emexes

Expert
Licensed User
Is it always exactly five cities per country?

In which case, maybe something like:

B4X:
For Country = 1 to 210    'don't worry, we're going to quit before we get this high
    Dim FirstCityColumn As Int = 2 + (Country - 1) * 5    'start at column 2, groups of 5
    If Spreadsheet.GetText("R12C" & FirstCityColumn).Trim.Length = 0 then    'city names on eg row 12? to allow for sheet headings, summaries, corporate logos, whatever
        Exit 'for
    End If
   
    Dim NumCountries As Int = Country
    Log("There should be five cities in columns " & FirstCityColumn & " to " & (FirstCityColumn + 4))
Next

Log("Found " & NumCountries & " countries with (hopefully) 5 cities each")
 
Upvote 0

BlueVision

Active Member
Licensed User
Longtime User
@emexes:
Thank you for your post Emexes.
Yes, as I said, only one example. In principle, there are always several entries grouped next to each other in a defined form that have to be read and evaluated together. But I never know beforehand (to stay with the example) which city is in which position (column-wise).
If I understand your approach correctly, you simply count through the table in steps of 5. So far so good. Correct.

But the problem is the implementation. Excel spreadsheets use letters as column names. Your code is correct in terms of the solution. But how do you apply it to the addressing of an Excel spreadsheet with letters?

As far as I have understood the XLReader, it can only be used to address columns via the letter designation, which is very complex from a programming point of view.
I am quite honest, the use of "ROWBASED" and "COLUMNBASED" in XLReader is not really clear to me. You can define a range that you read out of the worksheet, but this range must also always be provided with a letter identifier as far as the column designations are concerned.

Another solution could be to use the start column (XL.AddressZero), read out the header rows and then the columns accordingly. That's exactly what I can't manage.

I implemented the principle you suggested with the JExcel library under Android. That also works. But I can't simply transfer this solution to B4J with jPOI and XLReader. Unfortunately, columns cannot be controlled so easily via numbers. That is the problem. Perhaps it is also due to my problem description. I would like to clarify my problem with an example code, if I already had it...

Maybe in this case you also have to use a combination of jPOI commands and the XLReader. Unfortunately, there is very little documentation in this direction.
 
Upvote 0

emexes

Expert
Licensed User
@emexes:
But the problem is the implementation. Excel spreadsheets use letters as column names. Your code is correct in terms of the solution. But how do you apply it to the addressing of an Excel spreadsheet with letters?

Yeah, I was taking a chance that your Excel API could handle RnCn-style addressing like Excel can (or could: it was a carryover from Multiplan, maybe it got dropped).

https://support.microsoft.com/en-us/office/use-a-formula-in-a-word-or-outlook-table-cbd0596e-ea8a-485e-a35d-b2cb2c4f3e27#:~:text=it is ignored.-,RnCn references,row and the second column.

Anyway, this (untested) function might be useful:

B4X:
'converts eg Row 5 Col 7 to G5
Sub SpreadsheetCellAddress(Row As Int, Col As Int) As String

    Dim Letters(5) As Int    'that oughta do it, A..ZZZZZ good for > 10 million columns
  
    Letters(0) = Col
  
    For I = 0 to Letters.Length - 2
        Do While Letters(I) > 26    'mildly inefficient but easy to see how it works
            Letters(I) = Letters(I) - 26
            Letters(I + 1) = Letters(I + 1) + 1
        Loop
    Next
  
    Dim AllLetters As String = ""
    For I = 0 to Letters.Length - 1
        If Letters(I) <> 0 Then
            AllLetters = Chr(64 + Letters(I)) & AllLetters
        End If
    Next

    Return AllLetters & Row
  
End Sub
 
Last edited:
Upvote 0

emexes

Expert
Licensed User
Yeah, I was taking a chance that your Excel API could handle RnCn-style addressing like Excel can (or could: it was a carryover from Multiplan, maybe it got dropped).

https://support.microsoft.com/en-us/office/use-a-formula-in-a-word-or-outlook-table-cbd0596e-ea8a-485e-a35d-b2cb2c4f3e27#:~:text=it is ignored.-,RnCn references,row and the second column.

Anyway, this (untested) function might be useful:

Even better, use XLUtils method .AddressZero(col, row)

Might help to think of "arse-about" order of (col, row) as being same order as cartesian coordinates (x, y) or as alphabetical (which, coincidentally, is my backup-argument for latitude-before-longitude).

The row and column coordinates are 0-based, just as they would be if the spreadsheet was an array, thus A1 (or R1C1) would be (0, 0).

Anyway, numeric addressing of both col and row, rather than col as letters and row as number, is much simpler.

The XLReaderResult .Get method returns an Object, which could be a Double (if the cell is a number) or a String (if the cell is a string).
 
Last edited:
Upvote 0

emexes

Expert
Licensed User
Righto, add this helper function:

B4X:
'returns True if looks like a row or column heading, False if not
Sub ObjectLooksLikeHeading(o As Object) As Boolean
    
    If o Is String Then    'headings are Strings (not Doubles etc)
        Dim s As String = o
        If s.ToUpperCase.CompareTo(s.ToLowerCase) <> 0 Then    'with letters in them (right?...)
            Return True
        End If
    End If

    Return False
    
End Sub

and try this example code using the attached example file:

B4X:
Sub AppStart (Form1 As Form, Args() As String)
    MainForm = Form1
    MainForm.RootPane.LoadLayout("Layout1")
    MainForm.Show

    Dim xlu As XLUtils
    xlu.Initialize

    Dim xlr As XLReader
    xlr.Initialize(xlu)
    
    Dim result As XLReaderResult = xlr.ReadSheetByIndex(File.DirAssets, "cities.xlsx", 0)

    'assume spreadsheet contains at least one city, with data in at least the second column (col 1)

    Dim HeaderRow As Int = -1
    For row = 1 To 20
        If ObjectLooksLikeHeading( result.Get(xlu.AddressZero(1, row)) ) Then
            HeaderRow = row
        else if HeaderRow <> -1 Then
            Exit 'for
        End If
    Next

    Log("Header row is " & HeaderRow)

    Dim FirstHeaderCol As Int = 1    'already known from above
    Dim LastHeaderCol As Int = -1
    For col = FirstHeaderCol To 255
        If ObjectLooksLikeHeading( result.Get(xlu.AddressZero(col, HeaderRow)) ) Then
            LastHeaderCol = col
        Else
            Exit 'for
        End If
    Next

    Log("Header and (presumably) Data columns are " & FirstHeaderCol & " to " & LastHeaderCol)
    
    Dim FirstDataRow As Int = -1
    For row = HeaderRow + 1 To 255
        If result.get(xlu.AddressZero(FirstHeaderCol, row)) Is Double Then
            FirstDataRow = row
            Exit 'for
        End If
    Next

    If FirstDataRow = -1 Then
        Log("what the heck?!?! - no data?")
        Return
    End If
    
    Dim LastDataRow As Int = -1    'about to be rewritten at FirstDataRow
    For row = FirstDataRow To 255
        If result.get(xlu.AddressZero(FirstHeaderCol, row)) Is Double Then
            LastDataRow = row
        Else
            Exit 'for
        End If
    Next           

    Log("Data rows are " & FirstDataRow & " to " & LastDataRow)

End Sub

and if all goes well you should get something like:

1668740298589.png


Remember that the col and row numbering is zero-based, thus "Header row is 5" means the city name headers are actually on the sixth row, and the data is in the second to twenty-first columns (the first column "col 0" is the row headers).
 

Attachments

  • cities.xlsx
    8 KB · Views: 56
Upvote 0

emexes

Expert
Licensed User
In principle, there are always several entries grouped next to each other in a defined form that have to be read and evaluated together. But I never know beforehand (to stay with the example) which city is in which position (column-wise).

If you can cajole your users into at least having the row (data field names) and column headings (city names) and data begin at fixed locations, that'd make things a heap simpler.
 
Upvote 0

walt61

Active Member
Licensed User
Longtime User
Thank you very much!
Very complex for an example programme, it will take me a while to understand the code. But it's also not a simple topic. I have worked a lot with the jExcel library under Android, but with B4J you can also read XLSX or XLSM files thanks to jPOI and XLReader. But the differences are enormous.
How long did you work on this programme? Weeks?
It's part of something on which I work from time to time, and it grew over time - grew as in 'oh that's an idea, let's add that too'. This was the initial step: get the Excel data into an SQLite db and then take it from there as data manipulation is way easier then (as I'm sure you know). My first impulse was to just extract the code to which you referred but then thought that less isn't more in this case :)

Give it a whirl; in B4XMainPage lines 37/38/39 just enter your values and run the code, you'll see what's produced. The CopyOneWorksheetToTable method does this:
- Check for merged cells; if there are any, ask if it's ok to unmerge them, or quit
- Select the output table; if it exists, let the user decide: append rows, replace rows, create new table
- Get the Excel column headers (from the first row - @emexes 's code can help to make that row variable); ensure they're all present and unique, and determine the corresponding SQLite column names (replace characters that aren't valid for SQLite names)
- Try to determine the columns' data types (text, numeric, date, time, datetime)
- Prepare the SQLite INSERT statement
- Process the data

P.S.: I just realised I forgot to explicitly close the db and the workbook after processing so you may want to add that after the call to CopyOneWorksheetToTable.
 
Upvote 0

BlueVision

Active Member
Licensed User
Longtime User
Oh my goodness Emexes,
I think I've given your brain a problem to work on, haven't I? I am impressed by your commitment! This attempt also looks very promising.
I could have written that they are not cities, but car names with technical data. In fact, the data are always written in "sets" in the worksheet. While Excel likes to use a "header row" and then the data can be found in the column below, i.e. Excel works rather column-oriented, I then have to create these data sets for the SQL record (row-oriented, so to speak) in order to write them into the database. Of course, one could also "update" these data records step by step with data, but that simply takes too long. The Excel table changes frequently and then the SQL database has to be created again quickly.
My project itself mainly revolves around spare part numbers for various machines that my colleagues and I look after. The Excel file serves as a basis, but with 48 worksheets it has become a scrolling orgy. So I came up with the idea of putting the Excel data into an SQL database and then searching through it.
This usually works quite well, but if the data is distributed in this way in the worksheet, one already gets unexpected problems when analysing the structure of the Excel file.
I first tried to solve the problem by using lists, which I then read back into the SQL database. But that simply takes too much time and makes no sense with large lists. XLReader, on the other hand, is very fast, but is more suitable for selectively displaying data from an Excel file in a B4XTable, as long as the Excel file does not take on such dimensions.
Anyway, thank you very much for your contribution and I will try to implement your principle. But first I have to write a small test programme. The first analysis of the entire Excel file also takes a lot of time, because I also have to take sub-groupings into account and find out about them beforehand via the Excel source. That delays the testing enormously. I'll have to build a small test programme first.

Thank you very much!
 
Upvote 0

BlueVision

Active Member
Licensed User
Longtime User
@Walt:
Exactly!
I think a mixture of your code and Emexes' approach will solve my problem.
Sometimes you get bogged down in your own considerations and only realise much later that you have been working on the wrong solution for hours...
No problem if your code is not perfect now and thanks for the explanations. It's hard to see through more or less undocumented code if you haven't written it yourself.
I now have new ideas in mind, inspired by your hints and code examples. We'll see whether I can do it the way I think I can.
Thank you for all the time you have invested for me!
 
Upvote 0

BlueVision

Active Member
Licensed User
Longtime User
Thank you @walt61 and @emexes,
The solution was, of course, once again quite simple.
B4X:
    For col0Based = 0 To 5
        Dim Argument As String = XLResult.Get(XL.Addresszero(col0Based,0))
        Final = Final & "," & Argument
    Next
    Log ("Final" & Final)
This example code pushes the string "Argument" (placeholder variable) through the first row of an Excel sheet and fills the variable "Final" with the result of all cells.
Final then ultimately contains all the strings of the header cells of a sheet and can be further examined with Regex. Of course, you can also do other things with a positive string match, it is just a simple example.
As already noted by Emexes, the decisive factor is the use of XL.Addresszero(col0Based,0) as a pointer.

Thank you mates, you saved my day!
 
Upvote 0

emexes

Expert
Licensed User
I think I've given your brain a problem to work on, haven't I?

šŸ¤£ I think a good half of the problem-helping effort on this forum is burned up getting a picture of what the heck the data looks like.

Luckily, most of us enjoy a good puzzle (usually) although even gurus like @DonManfred run out of steam occasionally.

I'm glad to hear that your project will be of practical use and isn't just a school assignment.


I could have written that they are not cities, but car names with technical data.

That probably would have helped - I spent 17 years working with software for automotive workshops and test equipment (dynamometers and attached exhaust gas analysers, etc and OBD-II).

Although I'm intrigued - what are the five columns per vehicle (model?) ?
 
Upvote 0

BlueVision

Active Member
Licensed User
Longtime User
Although I'm intrigued - what are the five columns per vehicle (model?) ?
The thing with the automobiles was just another example I had in mind.
The real background:
I am working as a hardware technician for digital printers, so the Excel file is for solving hardware problems on several devices.
Categories for example (imagine that as columns within the sheet):
name of the machine, jam code, description
or
name of the machine, failure code, failing module, short advice and so on...

Every sheet contains a lot of information, machines side by side. The Excel file often changes weekly.
That's the problem nowadays, all the valuable information you need is within a spreadsheet, designed by idiots...
So it is a huge waste of time for every technician to find the information he is looking for. SQL can do that much better with the right search arguments prepared...
I am a really lazy guy, i hate to scroll over a neverending Excel-sheet...

There are some buttons within that XLSM file. It contains macros programmed with VBA for Excel.
The days of using Excel with VBA-macros are counted (probably also for "security reasons"), so there is a need of a different solution without scrolling and endless manual searches. And this solution is B4J...
 
Upvote 0

emexes

Expert
Licensed User
Every sheet contains a lot of information, machines side by side. The Excel file often changes weekly.
That's the problem nowadays, all the valuable information you need is within a spreadsheet, designed by idiots...
So it is a huge waste of time for every technician to find the information he is looking for. SQL can do that much better with the right search arguments prepared...

Are you heading towards something like this:

https://en.akinator.com/game

which I think tries at each point to ask the question that results in the smallest expected number of future questions,

but in your case you'd be aiming for questions that result in the smallest expected cost of tests and technician time

now that's an intriguing puzzle

lol it wouldn't surprise me if some of the questions are going to be perceived as racist, sexist or ageist

eg the likeliness of a given cause for a given problem might well correlate with whether the device is used mostly by teenagers or senior citizens
 
Upvote 0

BlueVision

Active Member
Licensed User
Longtime User
Well, it's not so much a question and answer game. The software only offers suggestions for solutions to clear machine faults, based on the information found in the manuals. This, coupled with a selection of frequently needed wear parts and a way to order them, helps enormously. Gathering this information helps enormously to save time and, above all, to give the customer a quick answer.
This leaves a little more time for other things.
 
Upvote 0
Top