Android Tutorial XML Parsing with the XmlSax library

Erel

Administrator
Staff member
Licensed User
It is simpler to parse XML with Xml2Map class: https://www.b4x.com/android/forum/threads/b4x-xml2map-simple-way-to-parse-xml-documents.74848/

The XmlSax library provides an XML Sax parser.
This parser sequentially reads the stream and raises events at the beginning and end of each element.
The developer is responsible to do something useful with those events.

There are two events:
B4X:
StartElement (Uri As String, Name As String, Attributes As Attributes)
EndElement (Uri As String, Name As String, Text As StringBuilder)
The StartElement is raised when an element begins. This event includes the element's attributes list.
EndElement is raised when an element ends. This event includes the element's text.

In this example we will parse the forum RSS feed. RSS is formatted using XML.
A simplified example of this RSS is:
B4X:
<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="2.0">
    <channel>
        <title>Basic4ppc  / Basic4android - Android programming</title>
        <link>http://www.basic4ppc.com/forum</link>
        <description>Basic4android - android programming and development</description>
        <ttl>60</ttl>
        <image>
            <url>http://www.basic4ppc.com/forum/images/misc/rss.jpg</url>
            <title>Basic4ppc  / Basic4android - Android programming</title>
            <link>http://www.basic4ppc.com/forum</link>
        </image>
        <item>
            <title>Phone library was updated - V1.10</title>
            <link>http://www.basic4ppc.com/forum/additional-libraries-official-updates/6859-phone-library-updated-v1-10-a.html</link>
            <pubDate>Sun, 12 Dec 2010 09:27:38 GMT</pubDate>
            <guid isPermaLink="true">http://www.basic4ppc.com/forum/additional-libraries-official-updates/6859-phone-library-updated-v1-10-a.html</guid>
        </item>
        ...MORE ITEMS HERE
    </channel>
</rss>
The first line is part of the XML protocol and is ignored.
On the second line the StartElement event will be raised with "Name = rss" and the attributes will include the "version" field.
The EndElement of the rss element will only be called on the last line: </rss>.

We will populate a list view with all items parsed from an offline file. When the user will press on an item we will open the browser with the relevant link.
Every item represents a forum thread.



For each item we are interested in two values. The title and the link.
The SaxParser object includes a handy list that holds the names of all the current parents elements.
This is useful as it will help us find the "correct" 'title' and 'link' elements. The correct elements are the ones under the 'item' element.

The parsing code in this case is pretty simple:
B4X:
Sub Parser_StartElement (Uri As String, Name As String, Attributes As Attributes)

End Sub
Sub Parser_EndElement (Uri As String, Name As String, Text As StringBuilder)
    If parser.Parents.IndexOf("item") > -1 Then
        If Name = "title" Then
            Title = Text.ToString
        Else If Name = "link" Then
            Link = Text.ToString
        End If
    End If
    If Name = "item" Then
        ListView1.AddSingleLine2(Title, Link) 'add the title as the text and the link as the value
    End If
End Sub
Title and Link are global variables.
We are only using EndElement events in this program.
First we check if we are inside an 'item' element. If this is the case we check the actual element name and save it if it is 'title' or 'link'.

If the current element is 'item' it means that we are done parsing an item.
So we add the data collected to the list view.

We are using ListView.AddSingleLine2. This method receives two values. The first is the item text and the second is the value that will return when the user will click on this item. In this case we are storing the link as the return value.

Later we will use it to open the browser:
B4X:
Sub ListView1_ItemClick (Position As Int, Value As Object)
    StartActivity(PhoneIntents1.OpenBrowser(Value)) 'open the brower with the link
End Sub
The code that initiated the parsing is:
B4X:
    Dim in As InputStream
    in = File.OpenInput(File.DirAssets, "rss.xml") 'This file was added with the file manager.
    parser.Parse(in, "Parser") '"Parser" is the events subs prefix.
    in.Close
 

Attachments

Last edited:

ssg

Well-Known Member
Licensed User
Hi Erel,

Thank you for this excellent library... been waiting for it :D

I have a question, my sample file had an empty line as the first line. This threw a runtime error. Deleting the empty line fixed the problem.

Is it a must that the first line be the XML declaration?

Thank you.
 

susu

Well-Known Member
Licensed User
I use PHP to generate the xml file like this:

B4X:
<?xml version="1.0" encoding="UTF-8"?>
<item>
<year>1431</year>
<content>Henry VI of England is crowned King of France.</content>
<year>1653</year>
<content>Oliver Cromwell takes on dictatorial powers with  the title of Lord Protector./content>
<year>1998</year>
<content>The United States launches a missile attack on Iraq  for failing to comply with United Nations weapons inspectors.</content>
</item>
I use your tutorial code to load the content:

B4X:
Sub Parser_EndElement (Uri As String, Name As String, Text As StringBuilder)
   If parser.Parents.IndexOf("item") > -1 Then
      If Name = "year" Then
         Title = Text.ToString
      Else If Name = "content" Then
         Link = Text.ToString
      End If
   End If
   If Name = "item" Then
      ListView1.AddTwoLines(Title, Link)
   End If
End Sub
It load the xml but only the last one (year 1998). What's wrong? Do I need to revise the xml file?
 

ssg

Well-Known Member
Licensed User
hi susu,

I believe the following line is causing issue:

B4X:
   If Name = "item" Then
      ListView1.AddTwoLines(Title, Link)
   End If
This means when the "item" tag closes, only then append the values to the list view.

I'd change this to the following:

B4X:
Sub Parser_EndElement (Uri As String, Name As String, Text As StringBuilder)
   If parser.Parents.IndexOf("item") > -1 Then
      If Name = "year" Then
         Title = Text.ToString
      Else If Name = "content" Then
         Link = Text.ToString
              ListView1.AddTwoLines(Title, Link)
      End If
   End If
End Sub
Not having access to B4A right now... but I hope that helps you out.

Cheers!
 

susu

Well-Known Member
Licensed User
Yeah! You saved me! Thank you SSG.
 

Kevin

Well-Known Member
Licensed User
I'm trying to write my first Android app using B4A and I am having a problem parsing XML.

I am opening a URL that returns XML and saving that return/result to a string. Then I am trying to feed that string into the XML parser, but I am getting an error when compiling.

--------------
src\com\cognitial\vstream\main.java:276: inconvertible types
found : java.lang.String
required: java.io.Reader
_parser.Parse2((java.io.Reader)(_result),"Parser");
--------------

Is there no way to feed the parser a string? How would I go about feeding the XML result from a URL into the parser? Do I need to 'save' it to the device first? If so, how would I do that, and how would I delete it when I am finished?
 

JogiDroid

Member
Licensed User
How is xml character encoding handled... as I get some error when there is 'ä' or 'ö' characters in xml stream.. is UTF8 only encoding that XmlSax handles or is it okay to use ->
B4X:
<?xml version='1.0' encoding='ISO-8859-1'?>

error code was:
B4X:
org.apache.harmony.xml.ExpatParser$ParseException: At line 8, column 197: not well-formed (invalid token)
 

JogiDroid

Member
Licensed User
All encodings are supported.
You should open the file(?) with a TextReader and use the correct encoding.
Then pass the TextReader to Parser.Parse2.
I was streaming it from web, normal http request... character encoding should be fine, at least when I checked output on my pc, the 'ä' character was a correct "ISO-8859-1" 'E4' hex number when viewed in hex editor..
 

agraham

Expert
Licensed User
I think the SaxParser itself expects Unicode characters8. As Erel says you will need to use a TextReader to convert your incoming response stream. I guess you need to use HttpResponse.ContentEncoding to identify the encoding, if you cannot assume what it is, then Initialize a TextReader with that encoding and HttpResonse.GetInputStream then pass the TextReader to Parser.Parse2.
 

JogiDroid

Member
Licensed User
This gets more confusing... Log(Response.ContentEncoding) throws ->

java.lang.NullPointerException at anywheresoftware.b4a.http.HttpClientWrapper$HttpResponeWrapper.getContentEncoding(HttpClientWrapper.java:328)

I assume there is no content encoding info available... but then just empty string object would be better than nullpointer exception...
 

agraham

Expert
Licensed User
You asked earlier
How is xml character encoding handled
SaxParser.Parse accepts an InputStream which is a byte stream.
SaxParser.Parse2 accepts an Reader which is a character stream.

Both these methods pass their streams to an InputSource object for the parser, there appears to be no encoding set for the InputSource.

From the Android documentation
The SAX parser will use the InputSource object to determine how to read XML input. If there is a character stream available, the parser will read that stream directly, disregarding any text encoding declaration found in that stream. If there is no character stream, but there is a byte stream, the parser will use that byte stream, using the encoding specified in the InputSource or else (if no encoding is specified) autodetecting the character encoding using an algorithm such as the one in the XML specification.
The reference to autodetecting the character enconding is too vague to predict what will happen when passing a byte stream (without actually trying it) so using a character stream would seem to be the best way of handling encoding problems.
 

JogiDroid

Member
Licensed User
Vague indeed... For now I can assume encoding of received xml but it is really odd that sax is not autodetecting it...

Well, doing it fixed way works.
B4X:
      in = Response.GetInputStream
      textin.Initialize2(in,"ISO-8859-1")
      XmlParser.Parse2(textin, "Parser")
      textin.Close
      in.Close
 

JogiDroid

Member
Licensed User
What is best way to parse multiple items (..list) that have multiple variables (type)??


B4X:
[I]I have xml like this:[/I]

<cars>
<car><name>aaa</name><weight>1234</weight><hp>100</hp></car>
<car><name>bbb</name><weight>1222</weight><hp>200</hp></car>
<car><name>ccc</name><weight>1333</weight><hp>300</hp></car>
<car><name>ddd</name><weight>1444</weight><hp>400</hp></car>
</cars>

[I]and [/I]
Type Car(name As String, weight As String, hp As String)
Dim myCars As List

[I]and standard parsing function..[/I]
Sub CarListParser_EndElement (Uri As String, Name As String, Text As StringBuilder)
...
So what is good way to fill myCars list ??
 

JogiDroid

Member
Licensed User
Hmm, it seems that SAX is bit complex to handle anything but simple xml's.. might be easier to manually parse using basic string operations.
 

JogiDroid

Member
Licensed User
Actually I think that it is the other way around. For complex XML files that can span any number of lines it will be very hard to parse them without an XML parser.
Upload your XML file and we will try to help.
Yep, big/complex is good job for parsers like SAX but it then needs lot of work to it get working... and it seems that in "java" world there is lot of extra libs to ease that work with SAX.

I have read few tutorials how (sax) parse xml's like my example.. simple array or list is fine but in my case something does not "fit".. I have to create car object in start event (<car>) and then in end event when <name> or <weight> or <hp> comes in I can add them to car object... and then I assume that I add that car object to my car list when <hp> end event comes (last member of car object)... In this case this is simple but it needs globals to be used as temp car object and temp car list (as locals wont work for event to event basis)... which fights my intuition of modern modular programming :)

I would rather use JSON parser which seems handle whole JSON data in one place, no events or not much global variable use there :) (ATM I just get a XML, so I just need to bite a bullet)

Even in this simple case... SAX is a quite bulldozer for my spoonful of sand :)

Anyway this is not fault of B4A or Java or SAX... just my bad day of thinking that simple xml would be simple to parse :)
 
Top