Android Question HTML Tag variations?

Discussion in 'Android Questions' started by GuyBooth, Mar 29, 2015.

  1. GuyBooth

    GuyBooth Active Member Licensed User

    I'm reading responses from UPnP Events which have the following format:

    NOTIFY / HTTP/1.1
    TYPE: text/xml;charset="utf-8"
    NT: upnp:
    NTS: upnp:propchange
    SID: uuid:46b5bbcb-6b62-1e7d-9c38-da25b4b15c0f

    Event xmlns="urn:schemas-upnp-org:metadata-1-0/AVT/">
                <InstanceID val=
                    <TransportState val=
                    <CurrentTrackDuration val=
    A SAX parser doesn't seem to like this, giving me the following error:
    (ParseException) org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 4: not well-formed (invalid token). I notice that after <LastChange> until </LastChange> the lines are statements (?) within < and > characters.
    Are these a particular format for which a parsing tool is already available, or do i need to write my own?

  2. Erel

    Erel Administrator Staff Member Licensed User

    Only part of this string is valid xml. Which value do you want to parse?
  3. warwound

    warwound Expert Licensed User

    Following on from Erel's question...

    Is the response you posted the actual body of the response or have you included the additional HTTP headers?
    Obviously your posted response is not valid XML.

    Assuming that you can make a request and receive a response that is a valid XML document (no headers etc) then you could look at my XOM library:

    You could use the XOMBuilder BuildFromURL method to request your XML.
    The XOMBuilder BuildDone event will then be raised and passed an XOMDocument which represents the parsed XML response.
    You can then use the various XOM objects and methods to obtain the values you require from the response.

    DonManfred likes this.
  4. GuyBooth

    GuyBooth Active Member Licensed User

    Erel, the part I want is the part that doesn't quite look like XML, e.g. <CurrentTrackDuration val="0:03:54"/>

    Martin, what I have posted is exactly what I receive including headers, except that there are often more values than I have shown in the "body".
    Building a parser to extract the information I need isn't difficult - all the items I am looking for include a "Name" followed by "val=" and the "Value" ends at "/>". The guts of the parser I have built is shown below, but I am not very familiar with XML and similar formats so I wondered whether there was already a parser available for this. I can use the one I have written. Maybe there's a more efficient way.

    For each line:
    Sub Parse_Item(Item as String)
    Dim sItem, sValue as String
    If Item <> "" then
    If Item.Contains("<"AND Item.Contains("val="AND Item.Contains("/>"Then
                sItem = Item.SubString2(Item.IndexOf(
    "<")+1,Item.IndexOf("val=") - 1)
                sValue = Item.SubString2(Item.IndexOf(
    "val=")+5,Item.IndexOf("/>") - 1)
    End If
    End if
    End Sub
    Thanks for your input.
  5. sorex

    sorex Expert Licensed User

    if you are sure the format is always like that you can get the time like this

    Dim t As String
    "NOTIFY / HTTP/1.1 " & CRLF & _
    "HOST:" & CRLF & _
    "CONTENT-Type: text/xml;charset=""utf-8"" " & CRLF & _
    "CONTENT-LENGTH: 1707" & CRLF & _
    "NT: upnp:event" & CRLF & _
    "NTS: upnp:propchange" & CRLF & _
    "SID: uuid:46b5bbcb-6b62-1e7d-9c38-da25b4b15c0f" & CRLF & _
    "SEQ: 0" & CRLF & _
    "" & CRLF & _
    "<e:property>" & CRLF & _
    "    <LastChange>" & CRLF & _
    "        <Event xmlns=""urn:schemas-upnp-org:metadata-1-0/AVT/"">" & CRLF & _
    "            <InstanceID val=""0"">" & CRLF & _
    "                <TransportState val=""NO_MEDIA_PRESENT""/>" & CRLF & _
    "                <CurrentTrackDuration val=""0:03:54""/>" & CRLF & _
    "            </InstanceID>" & CRLF & _
    "        </Event>" & CRLF & _
    "    </LastChange>" & CRLF & _

    Log (t.SubString2(t.IndexOf("CurrentTrackDuration val=")+26,t.IndexOf2("/",t.IndexOf("CurrentTrackDuration val="))-1))

    it spits out "0:03:54" (without the quotes)
    DonManfred likes this.
  6. Erel

    Erel Administrator Staff Member Licensed User

    Use the new smart strings literal:
    Dim t As String = $"NOTIFY / HTTP/1.1
    CONTENT-TYPE: text/xml;charset="utf-8"
    NT: upnp:event
    NTS: upnp:propchange
    SID: uuid:46b5bbcb-6b62-1e7d-9c38-da25b4b15c0f
    SEQ: 0

      <Event xmlns="urn:schemas-upnp-org:metadata-1-0/AVT/">
      <InstanceID val="0">
      <TransportState val="NO_MEDIA_PRESENT"/>
      <CurrentTrackDuration val="0:03:54"/>

    Dim m As Matcher = Regex.Matcher($"(\w+) val=\"([^"]+)""$, t)
    Do While m.Find
    Log($"Match found: ${m.Group(1)}: ${m.Group(2)}"$)

    Match found: InstanceID: 0
    Match found: TransportState: NO_MEDIA_PRESENT
    Match found: CurrentTrackDuration: 0:03:54
    ellpopeb4a and thedesolatesoul like this.
  7. GuyBooth

    GuyBooth Active Member Licensed User

    Yes that worked for me once I learned how to use the placeholders.
    The xml formatting takes a "<" and changes it to &lt, ">" to &gt etc etc. Is there a format that goes the other way? Change &lt to "<" ... for example?

    Excellent support as usual, thank you Erel.
  8. Erel

    Erel Administrator Staff Member Licensed User

    You have two options:
    1. Replace the five xml entities.
    2. Take the XML part from this string and use an XML parser to parse it.
  9. GuyBooth

    GuyBooth Active Member Licensed User

    Thanks. I'm using option 1.
  10. Erel

    Erel Administrator Staff Member Licensed User

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice