Android Code Snippet Remove multiple spaces inside strings

gravel

Member
Licensed User
To remove multiple, redundant spaces from inside strings and replace them with a single space.

B4X:
Sub RemoveRedundantSpace(TextToClean As String) As String
    Dim jo As JavaObject = Regex.matcher("[ ]{2,}", TextToClean)
    Return jo.RunMethod("replaceAll", Array(" "))
End Sub
It might be straightforward if you're familiar with regular expressions, but it took me a while to find something that worked.
 

RB Smissaert

Well-Known Member
Licensed User
Cross platform code:
B4X:
Sub RemoveRedundantSpace(TextToClean As String) As String
   Return Regex.Replace("[ ]{2,}", TextToClean, " ")
End Sub
Bear in mind that Regex is quite slow, in case that matters.
A simple replace in a loop is about 6 to 10 times faster:

B4X:
Sub RemoveRedundantSpace2(TextToClean As String) As String
 
 Dim strResult As String

 strResult = TextToClean.Replace("  ", " ")

 Do While strResult.IndexOf("  ") > -1
  strResult = strResult.Replace("  ", " ")
 Loop
 
 Return strResult
 
End Sub

RBS
 

Jorge M A

Well-Known Member
Licensed User
recursion has shown me better performance, with higher volumes.
B4X:
Sub RemoveRedundantSpace3(TextToClean As String) As String
    If TextToClean.IndexOf("  ")>-1 Then
        TextToClean=TextToClean.Replace("  ", " ")
        RemoveRedundantSpace3(TextToClean)
    End If
    Return TextToClean
End Sub
 

emexes

Well-Known Member
Licensed User
recursion has shown me better performance, with higher volumes.
Using recursion where iteration will do the job, always has me a bit nervous.

B4X:
Sub RemoveRedundantSpace4(TextToClean As String) As String

    Do While TextToClean.Contains("  ")
        TextToClean = TextToClean.Replace("  ", " ")
    Loop

    return TextToClean

End Sub
 
Last edited:

yfleury

Active Member
Licensed User
Edit: I Just see Erel as say about regex

I know than regex exist but i don't know how to use it. I know for sure the regex can remove all space more than one and do this very fast than a loop. look for regex in this forum
 
Last edited:

emexes

Well-Known Member
Licensed User
Lunchtime doodling... I measured this Char array method as 8x times faster than the obvious .Replace method, if the string contains a run of multiple spaces:
B4X:
Sub SingleSpace(X As String) As String

    If X.Contains("  ") = False Then
        Return X    'let sleeping dogs lie
    End If

    Dim bc As ByteConverter
 
    Dim XC() As Char = bc.ToChars(X)
 
    Dim SpaceChar As Char = " "
 
    Dim OldLength As Int = XC.Length
    Dim NewLength As Int = 0
    Dim LastChar As Char = "X"    'anything but a space
    Dim ThisChar As Char
 
    For I = 0 To OldLength - 1
        ThisChar = XC(I)
        If ThisChar = LastChar And ThisChar = SpaceChar Then
            'don't copy repeated spaces
        Else
            XC(NewLength) = ThisChar
            NewLength = NewLength + 1
        End If
        LastChar = ThisChar
    Next
 
    Return bc.FromChars(XC).SubString2(0, NewLength)    'cut string to (new reduced) size

End Sub
 
Last edited:

Erel

Administrator
Staff member
Licensed User
It is a big mistake to lead the discussion towards endless and mostly non-useful optimizations. Wiser than me already said: "preoptimization is the root of all evil."
If you do want to make such posts then start a new thread as it only adds confusion to the discussion.

Is the regex solution really slow?

Test it:
B4X:
Dim s As String = "sd fs dfjklsj dflkjs dflk jsdlfkj sdlf f jweklf wjlke fwe f wef we f wef we   jweflk fjwelkfj wlke  wef lfw elkfw ejfl  e w fwe fwe fwef we  wefjlk"
Dim start As Long = DateTime.Now
For i = 1 To 100000
   RemoveRedundantSpace(s)
Next
Log($"$1.2{(DateTime.Now - start) / 100000}"$)
The result on my device is 0.01 milliseconds. Does it matter if you can do it in 0.00000001 milliseconds? No. No user will ever see any difference between the two solutions.

Q: What about the one developer that needs to do it on a trillion characters long string?
A1: I don't think that there is one.
A2: If you have very specific requirements then you should look for very specific and customized solutions.
 

RB Smissaert

Well-Known Member
Licensed User
It is a big mistake to lead the discussion towards endless and mostly non-useful optimizations. Wiser than me already said: "preoptimization is the root of all evil."
If you do want to make such posts then start a new thread as it only adds confusion to the discussion.

Is the regex solution really slow?

Test it:
B4X:
Dim s As String = "sd fs dfjklsj dflkjs dflk jsdlfkj sdlf f jweklf wjlke fwe f wef we f wef we   jweflk fjwelkfj wlke  wef lfw elkfw ejfl  e w fwe fwe fwef we  wefjlk"
Dim start As Long = DateTime.Now
For i = 1 To 100000
   RemoveRedundantSpace(s)
Next
Log($"$1.2{(DateTime.Now - start) / 100000}"$)
The result on my device is 0.01 milliseconds. Does it matter if you can do it in 0.00000001 milliseconds? No. No user will ever see any difference between the two solutions.

Q: What about the one developer that needs to do it on a trillion characters long string?
A1: I don't think that there is one.
A2: If you have very specific requirements then you should look for very specific and customized solutions.
That is why I said: in case that matters.
I agree with you that if there is no speed problem then it is useless premature optimization.
Will only post these speed-ups if poster does mention a speed problem.

RBS
 

emexes

Well-Known Member
Licensed User
That is why I said: in case that matters.
I agree with you that if there is no speed problem then it is useless premature optimization.
I think Erel's comment was directed more to me than to you... but thanks for taking the bullet for me! ;-/

Whenever I get an alert saying that Erel's replied to a post of mine, my first thought is:

argh, fk, what did I get wrong this time???

followed by a sadness that I have distracted him from matters that are more deserving/needful of his expertise. I answer the mundane questions on the forum, or investigate issues in fields that I have experience with, so that Erel doesn't have to.

I posted the String-handling-as-Char-array example not so much for the speedup, but to add an extra arrow to the programming quiver: I hoped that non-C programmers here might look at it, and file away the String:Char() equivalence as an efficient solution for some problems.
Will only post these speed-ups if poster does mention a speed problem.
Regarding the saving milliseconds: agreed, saving a few milliseconds just once doesn't matter, but if it is something that is inside a deep loop and/or done thousands of times per second, or over large arrays of data, then that s**t can start to add up. String operations are a major sponge of CPU cycles. A commercial way of looking at it is: if 10% of a program consumes 90% of the clock cycles, and you can speed that up 8x, then you have just sped your program up 4.7x. CPU performance goes up about 33% a year, so that 4.7x improvement could be viewed as equivalent to a 5.4 year advantage over the competition. Your program can run on cheaper hardware, or do more on the same hardware, or last longer with the same battery, than the competition.

Where Erel is right is: I didn't actually read back to the original post. I should have done that. Thanks for pointing that out ;-)
 
Last edited:

Erel

Administrator
Staff member
Licensed User
argh, fk, what did I get wrong this time???
You did nothing wrong. On the contrary.

I just wanted to make it clear that general statements such as "regex is slow" are incorrect. If you avoid using regex because of performance then you (the developer, no one specifically) are doing something wrong.
The regex based solution takes 0.1 milliseconds for a not too short string. 0.1 milliseconds = 1 / 10000 second, is a very short period.
It is a mistake to try to optimize "everything". You will end up with complicated, unmaintainable and slow solution. Elegant, organized and clear code can be improved easily.
 

emexes

Well-Known Member
Licensed User
general statements such as "regex is slow" are incorrect
Agreed.

I tested the regex method and it was twice as fast as the .Replace(" ", " ") method.

I was impressed, but a bit wary of raising my head above the parapet whilst I could still hear bullets in the air :)
 

Didier9

Active Member
Licensed User
Elegant, organized and clear code can be improved easily.
That's the key right there. Write your code so it's clear and can easily be improved IF AND WHEN NECESSARY.
I have been quite surprised how Java eficiently runs code that was originally written for clarity even though I was fully expecting to have to rework it after putting it in a loop.
I have an app processing log files from a CAN device. I originally wrote it and tested it with a file that had a few hundred records. A customer sent me a new 15 MB log file... About 300,000 records. Processing time is a couple of seconds. There are tons of regex statements in there, but I have no need to speed up anything!
 
Top