Android Question Escaping Comma (,) in CSV File

Status
Not open for further replies.

walterf25

Expert
Licensed User
Longtime User
Hello everyone, i've come across this issue while downloading a csv file, the amount of records inside the file can vary from 90 to a few hundred records.

I'm able to save the file, read it using the StringUtils library, i'm able to extract the headers just fine, the problem is when i save the records to a Sqlite database. some of the records contain a comma within each column, for example the number of columns of the database is 34 which are the amount of columns in the header, when i parse each line and try to insert each record into the database, i sometimes will end up with some rows that have more than 34 columns and this is because of the comma (,) that some of the columns contain.

My question is, does anyone have any suggestion of how I can escape the comma (,) on some of the columns?

For example the column How did you Hear about us? will have something like this
Facebook, Instagram, Flyer.

As you can see in this case since there are two commas in that line, I will end up with 3 different items from that line even though it should be treated as only one line.

My relevant code is below.

B4X:
Dim su As StringUtils
Dim headers As List
Dim table As List
table = su.LoadCSV2(rp.GetSafeDirDefaultExternal(""), "list.csv", ";", headers)

''Insert each header column into a Map with the Column Type
        Dim headers2 As Map
        headers2.Initialize
        For j = 0 To headers.Size - 1
            Log("header: " & j & " " & headers.Get(j))
            Dim header() As String = Regex.Split(",", headers.Get(j))
            For r = 0 To header.Length - 1
                headers2.Put(header(r), "TEXT")
            Next
        Next
       
        '''Create table with columns given in headers2 map
        DBUtils.CreateTable(Common.SQL1, "members", headers2, "")

        ''since the splitting character in the csv file is ";"
        ''' the only item in the items() variable will 1 line with all the columns separated by commas.
        Dim membermap As Map
        membermap.Initialize
        For i = 0 To table.Size - 1
            Log("table item # " & i)
            Dim items() As String = table.Get(i)
            Dim fields() As String
            fields = Regex.Split(",", items(0))
            ''If headers2.Size = fields.Length Then
            LogColor("headers2 size: " & headers2.Size & " " & "fields lenght: " & fields.Length, Colors.blue)
            For q = 0 To headers2.Size - 1
                membermap.Put(headers2.GetKeyAt(q), fields(q))
                ''membermap.Put(headers2.GetKeyAt(q), items(q))
            Next
                Dim memberlist As List
                memberlist.Initialize
                memberlist.Add(membermap)
                DBUtils.InsertMaps(Common.SQL1, "members", memberlist)
        Next

Thanks for the help in advanced everyone.

Cheers,
Walter
 

MaFu

Well-Known Member
Licensed User
Longtime User
The csv format doesn't have an escape character. If a column contains delimiter char, quotation mark, carriage return or newline, then this column must be enclosed with quotation marks.
A csv reader must interpret every text in quotation marks as normal text. I don't know if StringUtils.LoadCSV2() respects this.
 
Upvote 0

MaFu

Well-Known Member
Licensed User
Longtime User
Tested with the B4J version of StringUtils: it works as espected.
Therefore the csv file may be wrong (columns with delimiter not enclosed in quotes).
 
Upvote 0

MaFu

Well-Known Member
Licensed User
Longtime User
Where and how is the csv file generated?
For me, you cannot escape the non relevant commas.
How would you differenciate column separator commas from standard commas?
The original file must be correct.
See my answer in post #2
 
Upvote 0

walterf25

Expert
Licensed User
Longtime User
Where and how is the csv file generated?
For me, you cannot escape the non relevant commas.
How would you differenciate column separator commas from standard commas?
The original file must be correct.
Hi Klaus, thanks for your response, the file is generated on a service called MailChimp, basically the file is generated from users who sign up for a soccer team, If i login to the service i can export the csv file and it works just fine, the problem is when i download the file through the app i'm working on, the raw data is basically just a string with commas separating the columns, all this is fine as i mentioned on my first post, i can parse the data just fine, the problem is when the user enters a comma in one of the columns to separate different options, for example there is a column where they need to enter what the experience of the player is, what their current team is etc.. as you can imagine if a player has played for 3 different teams they will enter this information like so (played for Baby Aztecas, played for boca jr, played for Dream Team).

That's when the issue begins when i load the file using StringUtils the library will take that above example as if they were 3 separate columns.
The website and sign up forms were not created by me, this were created by someone else, i'm just working on the app, the person who created the sign up forms has no coding experience so he doesn't know how to check if a user has entered a comma in any of the information and get rid of it.

Thanks all for your replies, I will need to come up with a different way to differentiate between the commas that separate the columns and the commas i the data in any given column.

Cheers,
Walter
 
Upvote 0

klaus

Expert
Licensed User
Longtime User
What software do they use to generate the csv files?
In Excel, for example, the csv files are saved with the semi-colon separator character.
Is there a parameter setting in MailChimp to choose the separator character?
Or your users need to use a different character.
 
Upvote 0

walterf25

Expert
Licensed User
Longtime User
What software do they use to generate the csv files?
In Excel, for example, the csv files are saved with the semi-colon separator character.
Is there a parameter setting in MailChimp to choose the separator character?
Or your users need to use a different character.
Unfortunately i don't know any of the specifics about how the file is generated, as i mentioned in my first post this website was created by someone else who know nothing about coding.

Thanks,
Walter
 
Upvote 0

dbprogramer

Member
Licensed User
Longtime User
In Excel, for example, the csv files are saved with the semi-colon separator character.
I just learned that actually Excel uses whatever delimiter is supplied by the regional settings of the operating system (at least in Windows it does (Control Panel, Region, Additional Settings, List Separator)). It is possible that this setting affects what delimiter is sent by the MailChimp program.
 
Upvote 0

Semen Matusovskiy

Well-Known Member
Licensed User
Regex.Split(",", ?! You want too easy life.

Yes, there are different delimiters between fields (and comma, and tab, and semi-colon). But it's not a problem, because you know in which format you receive a file.
Main problem are quotes (which can be also different). For example, Excel outputs string values inside quotes. If inside quotes, a comma is not a separator. Also inside quotes can be a quote like a text. So string value may looks "White,""Red"", Green".
 
Upvote 0
Status
Not open for further replies.
Top