B4J Question CSV Parser Class Help

walterf25

Expert
Licensed User
Longtime User
Hi all, I am using the CSVParser class located on this thead This seems to work just fine with smaller files, I however have a file that is around 117Mb which contains 764,298 rows of data and the CSVParser seems to fail, I do not get any errors but my NON_UI file stops working after a while.

Here's the relevant code where I parse the csv file and I try to convert it to an xls workbook because when I try to read the file directly from the csv file it fails and throws an error.

CSVParser:
Sub parseFile
    Dim parser As CSVParser
    Dim homepath As String = GetSystemProperty("user.home","")
    homepath = homepath & "\Documents\Arbin_Data"
    Log("homepath: " & homepath)
    parser.Initialize
    Dim tr As TextReader
    tr.Initialize2(File.OpenInput(homepath&"\"&"Resume_Table.csv", ""), "Windows-1252")
    Log("reading file into string")
    Dim input As String = tr.ReadAll
    Log("file read into string")
    Dim table As List = parser.Parse(input, ",", False)
    Log("file parse into table")
    CreateWorkbook(table, "Resume_Table.csv")
End Sub

CreateWorkBook:
Private Sub CreateWorkbook (Table As List, FileName As String)
    Log("creating workbook")
    xl.Initialize
    Dim wb As XLWorkbookWriter = xl.CreateWriterBlank
    Dim sheet As XLSheetWriter = wb.CreateSheetWriterByName("Sheet1")
    Dim address As XLAddress = xl.AddressName("A1")
    For Each row() As String In Table
        address.Col0Based = 0
        For Each cell As String In row
            sheet.PutString(address, cell)
            address.Col0Based = address.Col0Based + 1
        Next
        address.Row0Based = address.Row0Based + 1
        Log("row #: " & address.Row0Based)
    Next
    Log("saving new file to: " & FileName)
    wb.SaveAs(FileName, "", False)
    Log("finished creating workbook")
End Sub

Again, if I run this code with a 35Mb file with 70 rows of data then it works just fine, it's just with bigger files that I am seeing the issue where the app just stops working without any errors.

I added some logs to see where the entire code stops working, the last log I see is when it gets inside the CreateWorkBook sub, I never see the last line which should say "finished creating workbook'

I also added a log inside the second for loop inside the CreateWorkBook sub which logs the number of rows written, I can see that when it gets around 400000 is when it starts getting sluggish, I can see the app starts slowing down and eventually it stops. I was finally able to see a pop up error, please see image attached for the error.

Any ideas anyone?



Thanks,
walter
20221215_215210.jpg
 
Last edited:

walterf25

Expert
Licensed User
Longtime User
1. Make sure to test in release mode.
2. Increase the JVM heap:
B4X:
#VirtualMachineArgs: -Xms8g -Xmx8g

If the CSV data is simple enough (no quotes and no new lines) then it will be more efficient to parse it one line after another with a simple regex.split.
I just added the #VirtualMachineArgs line but It is still gets sluggish around 400000 rows, I also tried the following

B4X:
Private Sub CreateWorkbook (Table As List, FileName As String)
    Log("creating workbook")
    Dim tw As TextWriter
    tw.Initialize(File.OpenOutput(File.DirApp, FileName, True))
    Dim index As Int
    For Each row() As String In Table
        Dim sb As StringBuilder
        sb.Initialize
        For Each cell As String In row
            sb.Append(cell).Append(",")
        Next
        tw.WriteLine(sb.ToString)
        '''address.Row0Based = address.Row0Based + 1
        index = index + 1
        Log("row number: " & index)
    Next
    tw.Close
    Log("saving new file to: " & FileName)
    Log("finished creating workbook")
End Sub

and although it works and it performs way faster, after the file gets saved I can not open it with this code:

B4X:
    Dim data As List
    Dim xl As XLUtils
    xl.Initialize
    data = xl.Reader.ReadAll(File.DirApp, "Event_Table.csv")
    Log("data read into xl workbook: " & data.Size)

I get the following Error:
(RuntimeException) java.lang.RuntimeException: Object should first be initialized (PoiWorkbook).

Walter
 
Upvote 0

emexes

Expert
Licensed User
but It is still gets sluggish around 400000 rows

I think you'll find that's true of regular spreadsheet programs too. šŸ™ƒ

Maybe it'd be better to store that much data in an array or a database. Presumably any spreadsheet formulas are reasonably basic, eg col3 = col1 * col2, and maybe some row and/or column totals.
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
Got it, Ok, I'll have to find a different way then.
What do you need to do with the CSV-Data?

Post an example csv you have and the 2nd csv which you want to have. Best together with the calculationsteps needed.
If you do not need xlutils at all it will be much faster maybe. Depends on what you exactly need
 
Upvote 0

PaulMeuris

Active Member
Licensed User
In my CSV viewer tutorial I read the CSV file with File.ReadList(Dir,FileName).
That list is than used to selectively present the rows (max. 100 rows at a time).
The B4XGoodies CSV file with more than 6400 rows is processed smoothly.
For each column (field) i use the regex.split function.
B4X:
reclst = File.ReadList(importfolder,cbx2.SelectedItem)
For i = startindex To endindex
    rec = reclst.Get(i)
    Dim columns() As String = Regex.Split(delim,rec)
    For x = 0 To columns.Length-1
        ...
Maybe this approach might help.
I haven't tested the CSV viewer application with a big CSV file like yours, so i don't really know how the application would perform.
 
Upvote 0
Top