Android Question What is the fastest way to split this string?

RB Smissaert

Well-Known Member
Licensed User
Longtime User
Moving data from a text file to SQLite database.
If I have a line in the text file like for example this: 123, "John,", Smith
Then what would be the fastest way to split this line in these 3 fields:

123
John,
Smith

As these files can be large, I need to process the file line by line, so I can't use for example StringUtils.

RBS
 

RB Smissaert

Well-Known Member
Licensed User
Longtime User
B4X:
Dim pattern As String = ",(?=(?:(?:[^""]*""){2})*[^""]*$)"
Dim result() As String
result = Regex.Split(pattern, Text)
Thanks, that seems to work well.
That pattern looks seriously complex and I would never have got that without asking or some serious studying!

RBS
 
Upvote 0

LucaMs

Expert
Licensed User
Longtime User
That is interesting and it seems then I need to look into using ChatGPT too!
In some cases yes, but in the vast majority of times it makes many errors.

Today I asked it how to change the names of the files in a folder, with a batch file (wanting to change only some characters of the file names). After "only" about a hundred tries... I gave up! :confused:
 
Upvote 0

RB Smissaert

Well-Known Member
Licensed User
Longtime User
In some cases yes, but in the vast majority of times it makes many errors.

Today I asked it how to change the names of the files in a folder, with a batch file (wanting to change only some characters of the file names). After "only" about a hundred tries... I gave up! :confused:
I suppose after using ChatGPT for a while one will get an idea what kind of questions are promising or not.
For now I think I may learn about RegEx.

RBS
 
Upvote 0

RB Smissaert

Well-Known Member
Licensed User
Longtime User
Regex is so complicated that it's really worth leveraging ChatGPT for this :)
Just took me less than a minute to register with ChatGPT, post that question and get the right answer (same as you posted).
So, learned something there and well impressed!

RBS
 
Upvote 0

William Lancee

Well-Known Member
Licensed User
Longtime User
I tested the solution in #4 by AI on this string:
123,, "John,", Smith,,,

I expected/wanted to get (-=empty):
123
-
John,
Smith
-
-
-
But John, was in quotes, and the last three empty items were dropped.
I wrote my own version without Regex. It gave the right answer and turned out to be about four times faster: 15msec/10000 iterations vs 68msec /10000 iterations.
B4X:
Private Sub parseString(s As String) As List
    Dim aList As List: aList.Initialize
    Dim sb As StringBuilder: sb.Initialize
    Dim inQUOTE As Boolean
    For i = 0 To s.Length - 1
        Dim c As String = s.CharAt(i)
        If c = QUOTE Then
            If inQUOTE Then inQUOTE = False Else inQUOTE = True
        Else If c = "," And Not(inQUOTE) Then
            aList.Add(sb.toString.Trim)
            sb.Initialize
        Else
            sb.Append(c)
        End If
    Next
    aList.Add(sb.toString.Trim)
    Return aList
End Sub
 
Upvote 0

RB Smissaert

Well-Known Member
Licensed User
Longtime User
I tested the solution in #4 by AI on this string:
123,, "John,", Smith,,,

I expected/wanted to get (-=empty):
123
-
John,
Smith
-
-
-
But John, was in quotes, and the last three empty items were dropped.
I wrote my own version without Regex. It gave the right answer and turned out to be about four times faster: 15msec/10000 iterations vs 68msec /10000 iterations.
B4X:
Private Sub parseString(s As String) As List
    Dim aList As List: aList.Initialize
    Dim sb As StringBuilder: sb.Initialize
    Dim inQUOTE As Boolean
    For i = 0 To s.Length - 1
        Dim c As String = s.CharAt(i)
        If c = QUOTE Then
            If inQUOTE Then inQUOTE = False Else inQUOTE = True
        Else If c = "," And Not(inQUOTE) Then
            aList.Add(sb.toString.Trim)
            sb.Initialize
        Else
            sb.Append(c)
        End If
    Next
    aList.Add(sb.toString.Trim)
    Return aList
End Sub
Yes, I had noticed the same. I don't mind much the double quotes with John, but the missing elements at the end don't seem right.
I use my own custom text file parser (long code, but happy to post if anybody is interested), but thought maybe I can simplify by using TextReader and RegEx.
I turns out that TextReader gives me memory problems even when the file is processed in chunks (write to DB every x rows) and RegEx has the
problem with missing terminal elements.
Will have a look at your version of RegEx.

RBS
 
Upvote 0

RB Smissaert

Well-Known Member
Licensed User
Longtime User
Regex will not be faster than a more "primitive" parsing code. On the contrary.

Choose the one that is simpler for you to implement.
What do you think about this?

B4X:
Sub TestRegEx
    
    Dim str As String = "1,2,3,,,"
    Dim arr() As String = Regex.Split(",", str)
    
    Log(arr.Length) 'will show 3, was expecting 6
    
End Sub

RBS
 
Upvote 0

Erel

B4X founder
Staff member
Licensed User
Longtime User
1701611849542.png
 
Upvote 0

William Lancee

Well-Known Member
Licensed User
Longtime User
What do you think about this?
Splitting by comma won't ignore (as it should) commas inside quotes.

The algorithm in #14 is very efficient.
1. One pass through all the characters.
2. Use of the efficient StringBuilder
3. Use of the variable length list

It simply goes through the string character by character and looks for commas that are not inside quotes.
If the character is not a quote, it is appended to the StringBuilder buffer.

When it hits a not-in-quote comma, it saves the buffer to the result list and resets the buffer.
When done the last buffer is also added to the list. Done.
 
Upvote 0
Top