Android Tutorial XML Parsing with the XmlSax library

Discussion in 'Tutorials & Examples' started by Erel, Dec 12, 2010.

  1. Erel

    Erel Administrator Staff Member Licensed User

    It is simpler to parse XML with Xml2Map class: https://www.b4x.com/android/forum/threads/b4x-xml2map-simple-way-to-parse-xml-documents.74848/

    The XmlSax library provides an XML Sax parser.
    This parser sequentially reads the stream and raises events at the beginning and end of each element.
    The developer is responsible to do something useful with those events.

    There are two events:
    Code:
    StartElement (Uri As String, Name As StringAttributes As Attributes)
    EndElement (
    Uri As String, Name As String, Text As StringBuilder)
    The StartElement is raised when an element begins. This event includes the element's attributes list.
    EndElement is raised when an element ends. This event includes the element's text.

    In this example we will parse the forum RSS feed. RSS is formatted using XML.
    A simplified example of this RSS is:
    Code:
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <rss version=
    "2.0">
        <channel>
            <title>Basic4ppc  / Basic4android - Android programming</title>
            <link>http://www.basic4ppc.com/forum</link>
            <description>Basic4android - android programming 
    and development</description>
            <ttl>
    60</ttl>
            <
    image>
                <url>http://www.basic4ppc.com/forum/images/misc/rss.jpg</url>
                <title>Basic4ppc  / Basic4android - Android programming</title>
                <link>http://www.basic4ppc.com/forum</link>
            </
    image>
            <item>
                <title>
    Phone library was updated - V1.10</title>
                <link>http://www.basic4ppc.com/forum/additional-libraries-official-updates/
    6859-phone-library-updated-v1-10-a.html</link>
                <pubDate>Sun, 
    12 Dec 2010 09:27:38 GMT</pubDate>
                <guid isPermaLink=
    "true">http://www.basic4ppc.com/forum/additional-libraries-official-updates/6859-phone-library-updated-v1-10-a.html</guid>
            </item>
            ...MORE ITEMS HERE
        </channel>
    </rss>
    The first line is part of the XML protocol and is ignored.
    On the second line the StartElement event will be raised with "Name = rss" and the attributes will include the "version" field.
    The EndElement of the rss element will only be called on the last line: </rss>.

    We will populate a list view with all items parsed from an offline file. When the user will press on an item we will open the browser with the relevant link.
    Every item represents a forum thread.

    [​IMG]

    For each item we are interested in two values. The title and the link.
    The SaxParser object includes a handy list that holds the names of all the current parents elements.
    This is useful as it will help us find the "correct" 'title' and 'link' elements. The correct elements are the ones under the 'item' element.

    The parsing code in this case is pretty simple:
    Code:
    Sub Parser_StartElement (Uri As String, Name As StringAttributes As Attributes)

    End Sub
    Sub Parser_EndElement (Uri As String, Name As String, Text As StringBuilder)
        
    If parser.Parents.IndexOf("item") > -1 Then
            
    If Name = "title" Then
                Title = Text.ToString
            
    Else If Name = "link" Then
                Link = Text.ToString
            
    End If
        
    End If
        
    If Name = "item" Then
            ListView1.AddSingleLine2(Title, Link) 
    'add the title as the text and the link as the value
        End If
    End Sub
    Title and Link are global variables.
    We are only using EndElement events in this program.
    First we check if we are inside an 'item' element. If this is the case we check the actual element name and save it if it is 'title' or 'link'.

    If the current element is 'item' it means that we are done parsing an item.
    So we add the data collected to the list view.

    We are using ListView.AddSingleLine2. This method receives two values. The first is the item text and the second is the value that will return when the user will click on this item. In this case we are storing the link as the return value.

    Later we will use it to open the browser:
    Code:
    Sub ListView1_ItemClick (Position As Int, Value As Object)
        
    StartActivity(PhoneIntents1.OpenBrowser(Value)) 'open the brower with the link
    End Sub
    The code that initiated the parsing is:
    Code:
    Dim in As InputStream
        
    in = File.OpenInput(File.DirAssets, "rss.xml"'This file was added with the file manager.
        parser.Parse(in"Parser"'"Parser" is the events subs prefix.
        in.Close
     

    Attached Files:

    Last edited: Jan 4, 2017
    mobah, koaunglay, MhdBoy and 3 others like this.
  2. ssg

    ssg Well-Known Member Licensed User

    Hi Erel,

    Thank you for this excellent library... been waiting for it :D

    I have a question, my sample file had an empty line as the first line. This threw a runtime error. Deleting the empty line fixed the problem.

    Is it a must that the first line be the XML declaration?

    Thank you.
     
  3. Erel

    Erel Administrator Staff Member Licensed User

    Yes. The error thrown was thrown by the underlying system parser.
     
  4. ssg

    ssg Well-Known Member Licensed User

    got it! thanks a bunch....
     
  5. susu

    susu Well-Known Member Licensed User

    I use PHP to generate the xml file like this:

    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <item>
    <year>
    1431</year>
    <content>Henry VI of England 
    is crowned King of France.</content>
    <year>
    1653</year>
    <content>Oliver Cromwell takes on dictatorial powers with  the title of Lord Protector./content>
    <year>
    1998</year>
    <content>The United States launches a missile attack on Iraq  
    for failing to comply with United Nations weapons inspectors.</content>
    </item>
    I use your tutorial code to load the content:

    Code:
    Sub Parser_EndElement (Uri As String, Name As String, Text As StringBuilder)
       
    If parser.Parents.IndexOf("item") > -1 Then
          
    If Name = "year" Then
             Title = Text.ToString
          
    Else If Name = "content" Then
             Link = Text.ToString
          
    End If
       
    End If
       
    If Name = "item" Then
          ListView1.AddTwoLines(Title, Link)
       
    End If
    End Sub
    It load the xml but only the last one (year 1998). What's wrong? Do I need to revise the xml file?
     
  6. ssg

    ssg Well-Known Member Licensed User

    hi susu,

    I believe the following line is causing issue:

    Code:
    If Name = "item" Then
          ListView1.AddTwoLines(Title, Link)
       
    End If
    This means when the "item" tag closes, only then append the values to the list view.

    I'd change this to the following:

    Code:
    Sub Parser_EndElement (Uri As String, Name As String, Text As StringBuilder)
       
    If parser.Parents.IndexOf("item") > -1 Then
          
    If Name = "year" Then
             Title = Text.ToString
          
    Else If Name = "content" Then
             Link = Text.ToString
                  ListView1.AddTwoLines(Title, Link)
          
    End If
       
    End If
    End Sub
    Not having access to B4A right now... but I hope that helps you out.

    Cheers!
     
  7. susu

    susu Well-Known Member Licensed User

    Yeah! You saved me! Thank you SSG.
     
  8. Kevin

    Kevin Well-Known Member Licensed User

    I'm trying to write my first Android app using B4A and I am having a problem parsing XML.

    I am opening a URL that returns XML and saving that return/result to a string. Then I am trying to feed that string into the XML parser, but I am getting an error when compiling.

    --------------
    src\com\cognitial\vstream\main.java:276: inconvertible types
    found : java.lang.String
    required: java.io.Reader
    _parser.Parse2((java.io.Reader)(_result),"Parser");
    --------------

    Is there no way to feed the parser a string? How would I go about feeding the XML result from a URL into the parser? Do I need to 'save' it to the device first? If so, how would I do that, and how would I delete it when I am finished?
     
  9. Erel

    Erel Administrator Staff Member Licensed User

    Parse2 expects a TextReader not a string.
    Instead of saving the result to a string, just pass the InputStream directly to the XML parser.
     
  10. JogiDroid

    JogiDroid Member Licensed User

    How is xml character encoding handled... as I get some error when there is 'ä' or 'ö' characters in xml stream.. is UTF8 only encoding that XmlSax handles or is it okay to use ->
    Code:
    <?xml version='1.0' encoding='ISO-8859-1'?>

    error code was:
    Code:
    org.apache.harmony.xml.ExpatParser$ParseException: At line 8, column 197not well-formed (invalid token)
     
  11. Erel

    Erel Administrator Staff Member Licensed User

    All encodings are supported.
    You should open the file(?) with a TextReader and use the correct encoding.
    Then pass the TextReader to Parser.Parse2.
     
  12. JogiDroid

    JogiDroid Member Licensed User

    I was streaming it from web, normal http request... character encoding should be fine, at least when I checked output on my pc, the 'ä' character was a correct "ISO-8859-1" 'E4' hex number when viewed in hex editor..
     
  13. agraham

    agraham Expert Licensed User

    I think the SaxParser itself expects Unicode characters8. As Erel says you will need to use a TextReader to convert your incoming response stream. I guess you need to use HttpResponse.ContentEncoding to identify the encoding, if you cannot assume what it is, then Initialize a TextReader with that encoding and HttpResonse.GetInputStream then pass the TextReader to Parser.Parse2.
     
  14. JogiDroid

    JogiDroid Member Licensed User

    This gets more confusing... Log(Response.ContentEncoding) throws ->

    java.lang.NullPointerException at anywheresoftware.b4a.http.HttpClientWrapper$HttpResponeWrapper.getContentEncoding(HttpClientWrapper.java:328)

    I assume there is no content encoding info available... but then just empty string object would be better than nullpointer exception...
     
  15. agraham

    agraham Expert Licensed User

    You asked earlier
    SaxParser.Parse accepts an InputStream which is a byte stream.
    SaxParser.Parse2 accepts an Reader which is a character stream.

    Both these methods pass their streams to an InputSource object for the parser, there appears to be no encoding set for the InputSource.

    From the Android documentation
    The reference to autodetecting the character enconding is too vague to predict what will happen when passing a byte stream (without actually trying it) so using a character stream would seem to be the best way of handling encoding problems.
     
  16. JogiDroid

    JogiDroid Member Licensed User

    Vague indeed... For now I can assume encoding of received xml but it is really odd that sax is not autodetecting it...

    Well, doing it fixed way works.
    Code:
    in = Response.GetInputStream
          textin.Initialize2(
    in,"ISO-8859-1")
          XmlParser.Parse2(textin, 
    "Parser")
          textin.Close
          
    in.Close
     
  17. JogiDroid

    JogiDroid Member Licensed User

    What is best way to parse multiple items (..list) that have multiple variables (type)??


    Code:
    [I]I have xml like this:[/I]

    <cars>
    <car><name>aaa</name><weight>
    1234</weight><hp>100</hp></car>
    <car><name>bbb</name><weight>
    1222</weight><hp>200</hp></car>
    <car><name>ccc</name><weight>
    1333</weight><hp>300</hp></car>
    <car><name>ddd</name><weight>
    1444</weight><hp>400</hp></car>
    </cars>

    [I]
    and [/I]
    Type Car(name As String, weight As String, hp As String)
    Dim myCars As List

    [I]
    and standard parsing function..[/I]
    Sub CarListParser_EndElement (Uri As String, Name As String, Text As StringBuilder)
    ...
    So what is good way to fill myCars list ??
     
  18. JogiDroid

    JogiDroid Member Licensed User

    Hmm, it seems that SAX is bit complex to handle anything but simple xml's.. might be easier to manually parse using basic string operations.
     
  19. Erel

    Erel Administrator Staff Member Licensed User

    Actually I think that it is the other way around. For complex XML files that can span any number of lines it will be very hard to parse them without an XML parser.
    Upload your XML file and we will try to help.
     
    netkomm likes this.
  20. JogiDroid

    JogiDroid Member Licensed User

    Yep, big/complex is good job for parsers like SAX but it then needs lot of work to it get working... and it seems that in "java" world there is lot of extra libs to ease that work with SAX.

    I have read few tutorials how (sax) parse xml's like my example.. simple array or list is fine but in my case something does not "fit".. I have to create car object in start event (<car>) and then in end event when <name> or <weight> or <hp> comes in I can add them to car object... and then I assume that I add that car object to my car list when <hp> end event comes (last member of car object)... In this case this is simple but it needs globals to be used as temp car object and temp car list (as locals wont work for event to event basis)... which fights my intuition of modern modular programming :)

    I would rather use JSON parser which seems handle whole JSON data in one place, no events or not much global variable use there :) (ATM I just get a XML, so I just need to bite a bullet)

    Even in this simple case... SAX is a quite bulldozer for my spoonful of sand :)

    Anyway this is not fault of B4A or Java or SAX... just my bad day of thinking that simple xml would be simple to parse :)
     
Loading...