Android Question parsing text that contain invalid charcters

Addo

Well-Known Member
Licensed User
i am trying to receive bytes and convert it to string and parse it

the bytes that is string list that saved to a memorystream and send to b4a Asyncstream client socket

the code of new data looks like following

B4X:
Public Sub NewData (data() As Byte)
 
Dim msg As String

msg = BytesToString(data, 0, data.Length, "UTF8")

Dim param As String = msg
Dim paramnum() As String = Regex.Split("\~", param)

Log(paramnum(0))

End Sub

the text that sent from server looks like this

url1~
url2~
Url3~

and so on but after received and convert bytes to string there is some invalid charcters inserted that break the parsing like following

B4X:
url1~
������������
url2~
Url3~

i dont know from where this characters came after bytetostring conversion

now when i try to capture the parsing data as example paramnum(2)
i got an exception

main_vvvvvvvvv4 (java line: 489)
java.lang.ArrayIndexOutOfBoundsException: length=2; index=2
at app.name.main._vvvvvvvvv4(main.java:489)
at app.name.main._astream_newdata(main.java:381)
at java.lang.reflect.Method.invoke(Native Method)
at anywheresoftware.b4a.BA.raiseEvent2(BA.java:186)
at anywheresoftware.b4a.BA$2.run(BA.java:360)
at android.os.Handler.handleCallback(Handler.java:743)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:150)
at android.app.ActivityThread.main(ActivityThread.java:5621)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:794)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:684)
java.lang.ArrayIndexOutOfBoundsException: length=2; index=2

how can i solve this invalid charter after conversion ?

i have used msg = msg.trim with no luck to solve

also i have try to capture if there is any hidden charter that could break that so i used check hidden charcters online i come out with this result

B4X:
URL1~

URL2~
URL3~
URL4~
URL5~
 
Last edited:

MarkusR

Well-Known Member
Licensed User
Longtime User
i found a image about byte order
bom.png
 
Upvote 0

MarkusR

Well-Known Member
Licensed User
Longtime User
the text that sent from server looks like this
the data comes from your server app?
i would look what is sending out first.
if you transfer only text server uft-8 to client uft-8 is ok.
 
Upvote 0

Addo

Well-Known Member
Licensed User
yes the data comes from my server sending stringlist in memorystream that encoded with utf-8 which convertbyets to string seems to got complex DomManfred suggested to use byetsbuilder to remove the BOM Header then convert the byets to string

but i am confused of the usage of byetsbuilder to remove BOM from Data then make the byetsbuilder convert the data again to byets then convert the byetstostring ..
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
Something like
B4X:
log(bb.Indexof(array as Byte(0xef, 0xbb, 0xbf)))
bb.delete(array as Byte(0xef, 0xbb, 0xbf))
where bb is the bytebuilder containing the data

Just typed here as i never worked with the BytesBuilder. But this should bring you further

PD: Have you tried to set
B4X:
sl.WriteBOM := false;
in your Stringlist in Delphi?
TStrings have an property WriteBOM which is true by default.
B4X:
procedure TForm1.Button1Click(Sender: TObject);
begin
  Memo1.Lines.WriteBOM := True;
  memo1.Lines.SaveToFile('z:\temp\1.txt', TEncoding.UTF8);
end;

procedure TForm1.Button2Click(Sender: TObject);
begin
  Memo1.Lines.WriteBOM := False;
  Memo1.Lines.SaveToFile('z:\temp\2.txt', TEncoding.UTF8);
end;
 
Last edited:
Upvote 0

Addo

Well-Known Member
Licensed User
i have turn it to false now i dont receive BOM Header but still got split exception java.lang.ArrayIndexOutOfBoundsException: length=1; index=2

here the current data that comes

B4X:
2CLIENTS~������������|
URL1~URL2~URL3~URL4~URL5~

this is how new data looks like

B4X:
Public Sub NewData (data() As Byte)
    
Dim msg As String




msg = BytesToString(data, 0, data.Length, "UTF-8")
msg = msg.Trim
msg = msg.Replace(CRLF, "")
msg = msg.Replace(Chr(10), "")
msg = msg.Replace(Chr(13), "")





Dim param As String = msg
Dim paramnum() As String = Regex.Split("\~", param)

Log(paramnum(2))

End Sub
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
B4X:
    Dim s As String = "test~"&Chr(13)&Chr(10)&"test2~"&Chr(13)&Chr(10)&"test3" ' A String like you are sending....
    Dim param As String = s.Replace(Chr(13)&Chr(10),"")
    Dim paramnum() As String = Regex.Split("\~", param)
    Log("Paramnum="&paramnum.Length)
    For i = 0 To paramnum.Length-1
        Log(paramnum(i))
    Next

Logger connected to: samsung SM-T585
Paramnum=3
test
test2
test3
 
Upvote 0

keirS

Well-Known Member
Licensed User
Longtime User
i have turn it to false now i dont receive BOM Header but still got split exception java.lang.ArrayIndexOutOfBoundsException: length=1; index=2

here the current data that comes

B4X:
2CLIENTS~������������|
URL1~URL2~URL3~URL4~URL5~

this is how new data looks like

B4X:
Public Sub NewData (data() As Byte)
  
Dim msg As String




msg = BytesToString(data, 0, data.Length, "UTF-8")
msg = msg.Trim
msg = msg.Replace(CRLF, "")
msg = msg.Replace(Chr(10), "")
msg = msg.Replace(Chr(13), "")





Dim param As String = msg
Dim paramnum() As String = Regex.Split("\~", param)

Log(paramnum(2))

End Sub

That Regex doesn't work because your string is terminated with a ~. Which means it tries to create an empty array element. This library (works with Android as well as B4J) will work and split the string because it handles that issue

Dim APSU As ApacheSU
Dim SplitArray() As String = APSU.SplitWithSeparator("URL1~URL2~URL3~URL4~URL5~","~")
 
Upvote 0

OliverA

Expert
Licensed User
Longtime User
java.lang.ArrayIndexOutOfBoundsException: length=2; index=2
What line is actually causing this? In all the code examples, the only line that I see accessing an array is the log() statement. The problem I have with that statement is that you are logging something that may not exist. Your assumption is that the split is working, but if it is not, it will not create an array with members and log will bomb out. How about doing a log(paramnum.size) and making sure you even have an array with members to begin with.
 
Upvote 0

MarkusR

Well-Known Member
Licensed User
Longtime User
i made a test with the server file you had uploaded and all seems ok with it.
but the original is 42 bytes and the output in b4a is 40.
do you using also b4a version 7.80?

B4X:
Sub Button1_Click

    Dim a As String = File.GetText(File.DirAssets,"mss.txt")
    Log(a)

    Dim st As InputStream
    st = File.OpenInput(File.DirAssets,"mss.txt")
    Dim buffer() As Byte= Bit.InputStreamToBytes(st)
    st.Close
 
    Dim msg As String
    msg = BytesToString(buffer,0,buffer.Length,"UTF-8")

    Log(msg)
    Log(msg.Length)

    Dim List() As String = Regex.Split("\~",msg)
    Log(List(2))

End Sub
 
Last edited:
Upvote 0

Addo

Well-Known Member
Licensed User
i tried to grab different data from database about 20 records and save and send them it has the same issue . the data that sent maybe bytetostyring cannot handle memorystream

@OliverA the length of paramnum is

B4X:
** Activity (main) Create, isFirst = true **
** Activity (main) Resume **
2
1
48


@MarkusR that been said the problem is not in the data it self its obviously byetstostring cannot handle memorystream i am into this from yesterday couldnt figure out any solve
 
Upvote 0

Addo

Well-Known Member
Licensed User
and more weird when loop through it no errors at all

B4X:
msg = BytesToString(data, 0, data.Length, "UTF-8")

msg = msg.Replace(CRLF, " ")


Dim param As String = msg
Dim paramnum() As String = Regex.Split("\~", param)
For i = 0 To paramnum.Length-1
Log(paramnum(i)&i)
Next

output of different list

B4X:
2GETCATS0
 1
������������n0
red0
 blue1
 black2
 white3
 violet4
 green5
 yellow6
 softwhite7
 darkgray8
 darkblue9
 lightgray10
 lightred11
 purple12
 brown13
 skylight14
 skygreen15
 orange16
 lightorange17
 18

the way that i send the data from the server side

B4X:
AContext.Connection.IOHandler.Writeln('2GETCATS~',IndyTextEncoding_UTF8);

AContext.Connection.IOHandler.Write(MSS, 0, true);
 
Upvote 0

MarkusR

Well-Known Member
Licensed User
Longtime User
AContext.Connection.IOHandler.Write(MSS, 0, true);
do you need here also the encoding type?
 
Upvote 0

OliverA

Expert
Licensed User
Longtime User
You're sending two different items, the '2GETCATS~' via IOHandler.Writeln and whatever MSS contains via IOHandler.Write. This fires your NewData handler twice. Therefore you see an array length of 1 (Writeln) and then of 18 (Write). It looks like Writeln encodes your data correctly, but Write does not.
 
Upvote 0

MarkusR

Well-Known Member
Licensed User
Longtime User
i will not write stream any more i will start to change the sending side
or try to convert your outgoing first into a byte array and then send this bytes 1:1
so you can see in debug mode what is in this outgoing array.
 
Upvote 0
Top