Android Question parsing text that contain invalid charcters

PassionDEV

Well-Known Member
Licensed User
i am trying to receive bytes and convert it to string and parse it

the bytes that is string list that saved to a memorystream and send to b4a Asyncstream client socket

the code of new data looks like following

B4X:
Public Sub NewData (data() As Byte)
 
Dim msg As String

msg = BytesToString(data, 0, data.Length, "UTF8")

Dim param As String = msg
Dim paramnum() As String = Regex.Split("\~", param)

Log(paramnum(0))

End Sub

the text that sent from server looks like this

url1~
url2~
Url3~

and so on but after received and convert bytes to string there is some invalid charcters inserted that break the parsing like following

B4X:
url1~
������������
url2~
Url3~

i dont know from where this characters came after bytetostring conversion

now when i try to capture the parsing data as example paramnum(2)
i got an exception

main_vvvvvvvvv4 (java line: 489)
java.lang.ArrayIndexOutOfBoundsException: length=2; index=2
at app.name.main._vvvvvvvvv4(main.java:489)
at app.name.main._astream_newdata(main.java:381)
at java.lang.reflect.Method.invoke(Native Method)
at anywheresoftware.b4a.BA.raiseEvent2(BA.java:186)
at anywheresoftware.b4a.BA$2.run(BA.java:360)
at android.os.Handler.handleCallback(Handler.java:743)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:150)
at android.app.ActivityThread.main(ActivityThread.java:5621)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:794)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:684)
java.lang.ArrayIndexOutOfBoundsException: length=2; index=2

how can i solve this invalid charter after conversion ?

i have used msg = msg.trim with no luck to solve

also i have try to capture if there is any hidden charter that could break that so i used check hidden charcters online i come out with this result

B4X:
URL1~

URL2~
URL3~
URL4~
URL5~
 
Last edited:

MarkusR

Well-Known Member
Licensed User
the example used "UTF-8" i don't know if this will be a difference. you used "UTF8" above
 
Upvote 0

MarkusR

Well-Known Member
Licensed User
i meant
i just saw in b4a was it mentioned with a minus char. your code above is without.
 
Upvote 0

DonManfred

Expert
Licensed User
No, he is talking about that you use "UTF8" and usually it is named "UTF-8"
It would be helpful if you post the data received as HEX. If it contains the Bytes 0xef,0xbb,0xbf then it is probably a BOM-Header.

Can you upload a textfile with the data you are transfering?
 
Last edited:
Upvote 0

DonManfred

Expert
Licensed User
I did not asked for a code you are using.
Write the data to a textfile instead of a stream you are sending. Or write the data to a stream which points to a file.
Upload this file.

All i can say is that the data looks like to have a Byte order Mark (BOM) which it should NOT.
 
Upvote 0

klaus

Expert
Licensed User
This is the hex content of your file:

upload_2018-3-8_8-24-20.png


The first three characters are, as DonManfred suggested, the BOM characters.
Then you have 2SCLIENT~.
And a CR and a LF character after each ~ character.


C:\Users\klaus\AppData\Local\Temp\SNAGHTML6e8b78db.PNG
 
Upvote 0

PassionDEV

Well-Known Member
Licensed User
i have tried with no luck

B4X:
msg = BytesToString(data, 0, data.Length, "UTF-8")
msg = msg.Trim
msg = msg.Replace(CRLF, "")
msg = msg.Replace(Chr(10), "")
msg = msg.Replace(Chr(13), "")
msg = msg.Replace(Chr(127), "")
 
Upvote 0

klaus

Expert
Licensed User
The code below works with you file:

B4X:
Private txt, split() As String

txt = File.ReadString(File.DirAssets, "mss.txt")
Log (txt)
split = Regex.Split("~" & Chr(13) & Chr(10), txt)
For i = 0 To split.Length - 1
    Log(split(i))
Next
In the file you sent, there are no other 'invalid' characters like in the text you show in your firt post.
 
Upvote 0

PassionDEV

Well-Known Member
Licensed User
because it was pure text from server side does not process as byetstostring
iam still trying to figure how bytesbuilder works in this case

The code below works with you file:

B4X:
Private txt, split() As String

txt = File.ReadString(File.DirAssets, "mss.txt")
Log (txt)
split = Regex.Split("~" & Chr(13) & Chr(10), txt)
For i = 0 To split.Length - 1
    Log(split(i))
Next
In the file you sent, there are no other 'invalid' characters like in the text you show in your firt post.
 
Upvote 0

PassionDEV

Well-Known Member
Licensed User
@Erel i am doing following
B4X:
bb.Append(data)
bb.Remove()

but remove part what i have to remove ? U+FEFF ? i still trying to figure out
 
Upvote 0
Top