Help with deleting a control-character or whatever it is...

moster67

Expert
Licensed User
Longtime User
Attached is a test-project which reads an XML-file. The XML-file is in the files-folder.

When parsing this file, the result from reading the field "e2eventtitle" (also the field "e2eventname" but no others) always shows, what I believe is, a control-character at the beginning and at the end of the resulting string. The odd thing is that the control-character is only shown in Gingerbread (two devices and emulator) but not in ICS.

It is probably easier if you test the project to see what I mean.

I have tried trimming, checking for CTRL, chr(10) and other control-characters to replace but in vain. I thought about deleting the first character and the last character of the resulting string but unfortunately in some XML-files, this control-character is not present so I cannot do that.

Help to delete this character would be appreciated. Also I would like to understand why it does not show up in ICS? Perhaps a font-issue?
 

Attachments

  • test.zip
    7.4 KB · Views: 252

NJDude

Expert
Licensed User
Longtime User
Look at line #11 in the XML, there's a '0' there.
 

Attachments

  • Shot.jpg
    Shot.jpg
    92.5 KB · Views: 374
Last edited:
Upvote 0

moster67

Expert
Licensed User
Longtime User
Thanks NJDude,

but that is not the problem although why the 0 shows up that way I don't know. Maybe something happened when I saved the XML-file which actually is being downloaded from a satellite-receiver. I just saved it so I could show the problem. In any case, I tried to delete those lines with the zero and I also fixed the file but I still get the error nevertheless so I don't think the problem is there.

The problem is with this line(s):

line42
B4X:
<e2eventtitle>†Ultima parola Sky Sport 24‡</e2eventtitle>

but also line 41, 27 and 28 (which has same information).

Were you able to reproduce it using your device(s)?
 
Upvote 0

moster67

Expert
Licensed User
Longtime User
Incredible...in my previous post I copied one of the offending lines and now I can see 2 characters :sign0094:
I will try to replace them in my parsing routine...

EDIT: nope, replacing is not working:

B4X:
Sub parserGetCurrInfo_EndElement (Uri As String, Name As String, Text As StringBuilder)
   
   If parserGetCurrInfo.Parents.IndexOf("e2eventlist") > -1 Then
      If Name = "e2eventtitle" Then
         If Text.ToString <> "" Then
            Label1.Text=Text.ToString.Trim
            Label1.Text=Label1.Text.Replace("†","")
            Label1.Text=Label1.Text.Replace("‡","")
         End If
      End If
   End If
   
End Sub

Any ideas?
 
Upvote 0

moster67

Expert
Licensed User
Longtime User
@Erel

I don't think the 0 is the problem. I deleted it as well but still has the same problem. As I mentioned in my first post, this happens in Gingerbread and not in ICS so you must test with Gingerbread (also with an emulatator running GB I see this problem).

To make it clearer, I attach a screenshot so you can see the control character I get. I also upload here a new project where I have modified the XML-file to a bare minimum (no zeroes) but I still have the problem with GB. Please try it on GB.

If you get same result as me (see screeenshot) running GB, why is this problem not present in ICS? What can I do to resolve the problem? I tried making a substring skipping 1st and last letters but the problem is that this control-character is not always present. I would need a way to check if first letter is a control-character or not. Any ideas?

I hope you can reproduce the problem with new test-file. Thanks!
 

Attachments

  • test.png
    test.png
    28.3 KB · Views: 342
  • NewTest.zip
    7.2 KB · Views: 260
Upvote 0

warwound

Expert
Licensed User
Longtime User
I just ran your project on my ZTE Blade which is running CyanogenMod7 Gingerbread and the control characters do appear.

I didn't remove the 0 value from the e2iswidescreen tag - that looks odd just because of the formatting of the XML, if you formatted the XML like this it is obviously still valid XML:

<e2servicevideosize>N/AxN/A</e2servicevideosize>
<e2iswidescreen>0</e2iswidescreen>
<e2apid>450</e2apid>

I opened the XML file in Dreamwweaver and see no special characters - Dreamweaver reports the file is properly saved with UTF-8 encoding and no BOM.

And i opened the XML file with both Firefox and Chrome and they correctly rendered the page with no control characters appearing.

Running your project on my Huawei Ascend (ICS) shows no control characters.

All rather strange heh!!

Martin.
 
Upvote 0

warwound

Expert
Licensed User
Longtime User
Here's something that you can try...

Free Online Hex Editor & Viewer is an online hex editor webpage.

Load your getcurrent.xml file and scroll down to the first e2eventname element.

You'll see that it's value is prefixed by hex character codes C2 86, and suffixed by hex character codes C2 87.

e2eventtitle value is prefixed and suffixed by the same values.

Unicode/UTF-8-character table lists UTF-8 character codes and simply says <control> as what the C2 86 and C2 87 character sequences represent.

I searched Google a bit but couldn't find much info, but at least you should now be able to find and replace them - knowing what codes to look for.

The page here states:

C2 86 is the UTF-8 encoding of the character U+0086, an obscure C1 control character. This character exists in ISO-8859-1, but not in Windows' default code page 1252, which has printable characters in the space where ISO-8859-1 has the C1 controls.

Martin.
 
Upvote 0

moster67

Expert
Licensed User
Longtime User
@Martin

Thank you! That was a nice approach. :sign0188:

I googled a bit too and it seems those characters are used to indicate "START OF SELECTED AREA" and "END OF SELECTED AREA". See this page: Charbase U+0086: START OF SELECTED AREA

Most likely these are the characters always used in the xml-fields concerned so I think that I now can determine if they are present or not and then replace them if needed.

This evening I will try to find out how to track them using B4A. Maybe I can use the the ByteConverter-libary.

Thx Martin.
 
Upvote 0

moster67

Expert
Licensed User
Longtime User
Thanks for your help.

I think I have it now sorted knowing which characters are involved.

One approach is to use the Agraham's ByteConverter library. In this case, I only check for C286 assuming there will also be a trailing character (C287):

B4X:
Sub parserGetCurrInfo_EndElement (Uri As String, Name As String, Text As StringBuilder)
   
   If parserGetCurrInfo.Parents.IndexOf("e2eventlist") > -1 Then
      If Name = "e2eventtitle" Then
         If Text.ToString <> "" Then
            Dim x,y As String
            Dim Bytes(0) As Byte
            Dim bc As ByteConverter

            x=Text.ToString.SubString2(0,1)
            Bytes=bc.StringToBytes(x,"UTF8")
            y=bc.HexFromBytes(Bytes)
            If y="C286" Then
               x=Text.ToString.SubString2(1,Text.ToString.Length-1)
               Label1.Text=x.Trim
            Else
               Label1.Text=Text.ToString.Trim
            End If
         End If
      End If
   End If
   
End Sub

However, the easiest way is as Agraham suggested replacing chr(134) and chr(135). Much less code.

Many thanks again.
 
Upvote 0

warwound

Expert
Licensed User
Longtime User
Any idea though why a Gingerbread device renders the control characters yet an ICS device does not?

Is the ICS device stripping the control characters from the XML or is it simply not rendering them in the Label?

Martin.
 
Upvote 0

moster67

Expert
Licensed User
Longtime User
Well, testing with GB and ICS-emulators, the length of the string is always 28 both in GB and ICS. Using ByteConverter, also in ICS the first character is recognized as C286.

I guess this means that labels in ICS (but also Msgboxes) do NOT render or show them contrary to GB.
 
Upvote 0
Top