Android Question LoadCSV delimiter - bug?

TyneBridges · Jun 24, 2020

I use LoadCSV to load my data into a SQLite database. My application is a diary and has a long text field containing characters such as single and double quotes and slashes. I've set | (the vertical bar character) as my delimiter because it doesn't occur in the data.

B4X:

Table = SU.LoadCSV(File.DirRootExternal, G.TFName, "|")

I'm finding that, if my text field entries start with a double quote (the word "Lockdown"), the quotation mark is taken as a delimiter and the database field is truncated after that word. This hasn't been a problem with quote marks further on in the text - the text is correctly read and ends at the | character - but is at the start. Is this expected behaviour? I can't think of a way round it other than not using that character in my file, which is problematic for me.

I'm still using B4A 5.02 and have avoided upgrading in case this brings new problems, but obviously will if this issue has been fixed. Any suggestions would be gratefully received.

Thanks.

emexes · Jun 29, 2020

This almost does the job, except that it falls over on two of the lines. I'm having a closer look at the input file now.

B4X:

Dim WholeFile As String = File.ReadString("c:\b4j", "SQLiteExport.txt")

WholeFile = WholeFile.Replace(Chr(10) & Chr(10), Chr(13))

Dim Lines() As String = Regex.Split("[\n]", WholeFile)

For I = 0 To Lines.Length - 1
    Log(I & " = " & Lines(I))
    
    Dim Fields() As String = Regex.Split("\|", Lines(I))
    For J = 0 To Fields.Length - 1
        Log("  " & J & " = [" & Fields(J).Replace(Chr(13), Chr(10)) & "]")
    Next
    
    'add Fields() to database here
Next

TyneBridges · Jun 29, 2020

Thanks for your help, Emexes. I tried your earlier code but couldn't get the output to load into my MySQL database - I think the problem was that the original version used a list and yours didn't. It was giving me a value for Lines.length of 1 although it seemed to be reading several records.

emexes · Jun 29, 2020

This assembles the data to a list of string arrays, same as your original Table = SU.LoadCSV(File.DirRootExternal, G.TFName, "|")

In fact, we should just be writing a matching version of LoadCSV but call it LoadCsvWithoutBonusQuoteProcessing(...) ?

B4X:

Dim WholeFile As String = File.ReadString("c:\b4j", "SQLiteExport.txt")

WholeFile = WholeFile.Replace(Chr(10) & Chr(10), Chr(13))

Dim Lines() As String = Regex.Split("[\n]", WholeFile)

Dim ListOfStringArrays As List
ListOfStringArrays.Initialize

For I = 0 To Lines.Length - 1
    Log(I & " = " & Lines(I))
    Dim ArrayOfValues() As String = Regex.Split("\|", Lines(I))
    ListOfStringArrays.Add( ArrayOfValues )
Next
    
'ListOfStringArrays should be ready to add to database
'EXCEPT for those two pesky exceptions to the double-LF escaping

For Each SA() As String In ListOfStringArrays
    Log("*** START OF RECORD ***")
    For Each S As String In SA
        Log(S)
    Next
    Log("*** END OF RECORD ***")
Next

emexes · Jun 29, 2020

In the long text fields of your test data, have users actually typed in two line-breaks so as to leave a blank line between paragraphs?

It does not seem logical that the export process would only double up line-breaks within a value, but not at the end of a value.

emexes · Jun 29, 2020

Righto, this seems to work. But only for lines containing 10 values ie 9 "|" vertical bars

B4X:

Dim WholeFile As String = Chr(10) & File.ReadString("c:\b4j", "SQLiteExport.txt")

'bit of a dog's breakfast, thanks to unquoted LFs in data indistinguishable from end-of-line LFs
Dim SearchPattern As String = "(\d\d\d\d\-[^|]*)\|([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)\n"
'four digit year, dash, nine bars, LF

Dim ListOfStringArrays As List
ListOfStringArrays.Initialize

Dim m As Matcher = Regex.Matcher(SearchPattern, WholeFile)
Do While m.Find
    Dim SA(m.GroupCount)
    For I = 1 To m.GroupCount
        SA(I - 1) = m.Group(I)
    Next
    ListOfStringArrays.Add(SA)
Loop
    
For Each SA() As String In ListOfStringArrays
    Log("*** START OF RECORD ***")
    For Each S As String In SA
        Log("[" & S & "]")
    Next
    Log("*** END OF RECORD ***")
Next

emexes · Jun 29, 2020

In summary, what turned this from a seemingly-easy job to a PITA job is that there is no reliable way of telling whether a LF is part of a value or marks the end of a line, unless: you know how many values are on a line, and the last value does not contain a LF.

TyneBridges · Jun 30, 2020

Thanks again, Emexes. Everything is working now. I'm grateful for your time and expertise.

Initially my RETURN characters had all gone in the text after import into the app. I'd forgotten my original workaround for the problem, which was to replace all returns in the main text field with a random string that never occurred in the text ('XzQ'). I reinstated this into the export instruction and that fixed the issue. After importing, the Android app again replaces all instances of that string with actual returns.

emexes · Jun 30, 2020

expertise, lol

Android Question LoadCSV delimiter - bug?

TyneBridges

Member

emexes

Expert

TyneBridges

Member

emexes

Expert

emexes

Expert

emexes

Expert

emexes

Expert

TyneBridges

Member

emexes

Expert

Similar Threads