B4J Library [B4X] Xml2Map - Simple way to parse XML documents

Status
Not open for further replies.
Nobody likes to parse XML.

Parsing JSON is simple and fun. Parsing XML is tedious and boring.

That is the reason behind the Xml2Map class. It internally parses the XML document and returns a Map with the parsed data. It is similar to parsing JSON.
Tip: You can use this tool to help you with parsing JSON: https://b4x.com:51041/json/index.html

So instead of the code explained in the old tutorial: https://www.b4x.com/android/forum/threads/xml-parsing-with-the-xmlsax-library.6866/#content

We can achieve the same thing with this code:
B4X:
Sub Process_Globals
   Private ParsedData As Map
End Sub

Sub Globals
   Private ListView1 As ListView
End Sub

Sub Activity_Create(FirstTime As Boolean)
   If FirstTime Then
     Dim xm As Xml2Map
     xm.Initialize
     xm.StripNamespaces = True '<--- new in v1.01
     ParsedData = xm.Parse(File.ReadString(File.DirAssets, "rss.xml"))
   End If
   Activity.LoadLayout("1")
   ListView1.SingleLineLayout.ItemHeight = 60dip
   Dim rss As Map = ParsedData.Get("rss")
   Dim channel As Map = rss.Get("channel")
   Dim items As List = channel.Get("item")
   For Each item As Map In items
     Dim title As String = item.Get("title")
     Dim link As String = item.Get("link")
     ListView1.AddSingleLine2(title, link)
   Next
End Sub

Sub ListView1_ItemClick (Position As Int, Value As Object)
   Dim pi As PhoneIntents
   StartActivity(pi.OpenBrowser(Value))
End Sub

You can use the JSON library to convert the Map to a json string, this is useful for understanding how to access the data:
B4X:
Dim jg As JSONGenerator
jg.Initialize(ParsedData)
Log(jg.ToPrettyString(4))

The result in this case will look like:
"rss": {
"Attributes": {
"version": "2.0"
},
"channel": {
"title": "Basic4ppc \/ Basic4android - Android programming",
"link": "http:\/\/www.b4x.com\/forum",
"description": "Basic4android - android programming and development",
"language": "en",
"lastBuildDate": "Sun, 12 Dec 2010 10:19:27 GMT",
"generator": "vBulletin",
"ttl": "60",
"image": {
"url": "http:\/\/www.b4x.com\/forum\/images\/misc\/rss.jpg",
"title": "Basic4ppc \/ Basic4android - Android programming",
"link": "http:\/\/www.b4x.com\/forum"
},
"item": [
{
"title": "Phone library was updated - V1.10",
"link": "http:\/\/www.b4x.com\/forum\/additional-libraries-official-updates\/6859-phone-library-updated-v1-10-a.html",
"pubDate": "Sun, 12 Dec 2010 09:27:38 GMT",
"description": "An Intent object was added. This allows creating custom intents for interacting with external applications and services.\n\nInstallation...",
"encoded": "<div>An Intent object was added...",
"category": {
"Attributes": {
"domain": "http:\/\/www.b4x.com\/forum\/additional-libraries-official-updates\/"
},
"Text": "Additional libraries and official updates"
},
"creator": "Erel",
"guid": {
"Attributes": {
"isPermaLink": "true"
},
"Text": "http:\/\/www.b4x.com\/forum\/additional-libraries-official-updates\/6859-phone-library-updated-v1-10-a.html"
}
MORE ITEMS HERE

Note that attributes are added under the Attributes key. In such cases the text will be available under the Text key.

This module is compatible with B4A, B4J and B4i.
It depends on XmlSax library (which is included in the IDE).

upload_2017-1-4_14-26-40.png


Edit (October 2017):

Common pitfall


Consider this xml:
B4X:
<root>
<book>
   <title>Book 1</title>
</book>
<book>
   <title>Book 2</title>
</book>
</root>

There could be any number of book elements.
You can parse it with:
B4X:
Dim root As Map = ParsedData.Get("root")
For Each book As Map In root.Get("book")
Dim title As String = book.Get("title")
Next
However this code will fail in two cases:
1. There is only one book in the xml so root.Get("book") will return a Map instead of a List.
2. There are no books at all so root.Get("book") will return Null.

To solve this issue you can use this sub:
B4X:
Sub GetElements (m As Map, key As String) As List
   Dim res As List
   If m.ContainsKey(key) = False Then
     res.Initialize
     Return res
   Else
     Dim value As Object = m.Get(key)
     If value Is List Then Return value
     res.Initialize
     res.Add(value)
     Return res
   End If
End Sub
It will return a list in all cases.
You can safely use it with:
B4X:
Dim root As Map = ParsedData.Get("root")
For Each book As Map In GetElements(root, "book"))
Dim title As String = book.Get("title")
Next


Map2Xml - New class!

Map2Xml converts the map created with Xml2Map to a Xml string. It uses XmlBuilder library and it is compatible with B4A, B4i and B4J.
It can be used to modify existing XML documents. You read the document with Xml2Map, make the changes in the returned map and write it back with Map2Xml.

It is an internal library now.

Updates:

- v1.01 - New StripNamespaces property. When set to true the namespaces from keys and attributes are stripped. It is recommend to set it true. The behavior regarding namespaces, between B4A, B4J and B4i is different when namespaces are kept.
 

Attachments

  • Xml2Map.b4xlib
    2.2 KB · Views: 201
Last edited:

Mahares

Expert
Licensed User
Longtime User
It depends on XmlSax library.
Where do you download this library XmlSax. It seems like I am always having trouble finding the link to download libraries, even with a forum search. You are always sent to the lib doc, but not the lib download itself.
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
It seems like I am always having trouble finding the link to download libraries, even with a forum search.
It should never take more than a single search to find a library if it is in the "additional libraries" section.

XmlSax / jXmlSax / iXmlSax are not in the forum because they are included in the IDE package (internal libraries).
 

Mahares

Expert
Licensed User
Longtime User
It depends on XmlSax library.
When I saw that you mentioned that it depends on the XmlSax lib, I automatically thought it is part of the additional lib folder. Otherwise , it is a moot point and you did not have to mention it if it is part of the internal lib.
Thank you
 

corwin42

Expert
Licensed User
Longtime User
Unfortunately it is still synchronous because it depends on XmlSax. So it is only usable for relatively small xml data. For larger xml files it blocks the ui thread too long. A asynchonous XmlSax library would be nice.
 
Last edited:

corwin42

Expert
Licensed User
Longtime User
It is indeed synchronous. What is the size of the XML document you are parsing? Can you upload it?

The maximum size of the uncompressed file is about 170kB.
The example file is generated from this url.

Because parsing with XmlSax needs a few seconds I do the whole step of uncompressing the data, parsing the xml and saving the result into a SQLite database in a seperate thread with the treading library. Unfortunately this breaks the debugger so I can't use it anymore with the app.

I have thought about to convert the code to warwounds XOM library to try if this works better but I hadn't time to do the conversion for now.
 

Attachments

  • data.zip
    7.4 KB · Views: 887

Erel

B4X founder
Staff member
Licensed User
Longtime User
It takes 125ms to parse it in release mode (300ms in debug):
B4X:
Dim xm As Xml2Map
xm.Initialize
Dim n As Long = DateTime.Now
ParsedData = xm.Parse(File.ReadString(File.DirAssets, "data.xml"))
Log(DateTime.Now - n)

Inserting to SQLite is not relevant to this thread however make sure to create a single transaction and if it is not fast enough then you can use SQL.ExecNonQueryBatch to insert it in the background.
 

corwin42

Expert
Licensed User
Longtime User
It takes 125ms to parse it in release mode (300ms in debug):
Hmm, on which device? I remember it took several seconds in the past.

[QUOTE\
Inserting to SQLite is not relevant to this thread however make sure to create a single transaction and if it is not fast enough then you can use SQL.ExecNonQueryBatch to insert it in the background.[/QUOTE]
Yes I know. I think I will have to do some refactoring again.

If I remember correctly one of the main causes why I handle all this in its own thread was that the whole stuff was done in the background by a service and the UI was not very fluid when parsing and updating the database was handled on the ui thread, too.
 

samikinikar

Member
Licensed User
Longtime User
My XML file

B4X:
<id>1770</id>
<image>http://Cityonline.com/custom/domain_1/image_files/sitemgr_photo_4681.png</image>
<thumb>http://Cityonline.com/custom/domain_1/image_files/sitemgr_photo_4682.png</thumb>
<updated>2017-01-26 16:41:19</updated>
<entered>2017-01-26 16:41:08</entered>
<renewal_date>0000-00-00</renewal_date>
<title>City | Cheque in Kannada dishonoured, Customer drags bank to court</title>
<seo_title>City | Cheque in Kannada dishonoured, Customer drags bank to court</seo_title>
<friendly_url>City-cheque-in-kannada-dishonoured-customer-drags-bank-to-court</friendly_url>
<author>www.dummyurl.com</author>
<author_url></author_url>
<publication_date>2017-01-26</publication_date>
<abstract>A customer has dragged ICCI bank to court after his cheque was dishonoured on grounds that the information on it was written in Kannada.</abstract>
<seo_abstract>A customer has dragged ICCI bank to court after his cheque was dishonoured on grounds that the information on it was written in Kannada.</seo_abstract>
<keywords>City neews || news of City || belagavi || news news || news about City || City news || news baout City</keywords>
<seo_keywords>City neews, news of City, belagavi, news news, news about City, City news, news baout City</seo_keywords>
<content>&lt;p style=&quot;text-align: justify;&quot;&gt;&lt;span style=&quot;font-size: small; font-family: verdana, geneva;&quot;&gt;City | Belagavi&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: justify;&quot;&gt;&lt;span style=&quot;font-size: small; font-family: verdana, geneva;&quot;&gt;A customer has dragged ICCI bank to court after his cheque was dishonoured on grounds that the information on it was written in Kannada.&lt;br /&gt;&lt;/span&gt;&lt;lt;br /&gt;&lt;span style=&quot;font-size: small; font-family: verdana, geneva;&quot;&gt;Anand Diwakar Garag has filed a case, alleging lack of service, with the district consumer redressal court in Belagavi. In November, Garag presented a cheque for Rs 17,220 to the Life Insurance Corporation of India (LIC), as premium for his insurance policy. The LIC handed the cheque over to Corporation Bank, which handles its accounts. However, when the cheque was presented to ICICI for payment, it was returned with a note &quot;present with document&quot;.&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-size: small; font-family: verdana, geneva;&quot;&gt;Before he filed the case, Garag sought clarification from both ICICI bank and LIC as to why his cheque had been dishonoured. However, neither furnished a satisfactory explanation. Garag told TOI that he made Hescom payment in cheques, wherein all details were filled in Kannada. &quot;My bank told me that the reason they dishonoured my cheque was because the details were filled in Kannada. Also, in another incident that occurred after this one, ICICI bank dishonoured a cheque I had given to a private firm,&quot; he said. The consumer redressal court has issued notices to LIC, ICICI and Corporation Bank in connection with the case, which will be heard on February 28.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;&lt;em&gt;&lt;strong&gt;Image is for representation only&lt;br /&gt;&lt;/strong&gt;&lt;/em&gt;&lt;em&gt;&lt;strong&gt;Source :TOI&lt;/strong&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;</content>
<status>Active</status>
<suspended_sitemgr>n</suspended_sitemgr>
<level>article</level>
<number_views>735</number_views>
<avg_review>0</avg_review>
</article_info>

I have no issues retrieving data with other tags other than <content> tag, the content tag contains HTML tags, so the data is not fetch and there are no results in the list view.
Can someone please help to get the contents of the <content> tag without html tags using this XML parser ?
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
I don't see anything special here.
B4X:
Dim xm As Xml2Map
xm.Initialize
Dim root As Map = xm.Parse(File.ReadString(File.DirAssets, "test.xml"))
Dim m As Map = root.Get("eDirectoryData")
m = m.Get("ObjectData")
Dim entries As List = m.Get("entry")
For Each m As Map In entries
   Log("********************")
   Log(m.Get("articleContent"))
Next
 

samikinikar

Member
Licensed User
Longtime User
Thank you for the update, yes without removing any formatting it retrieves the content, but If i remove the html tags and fetch the contents, it displays the following error. Attach is the formatted xml file ( format.xml )

Error :
** Activity (main) Create, isFirst = true **
Error occurred on line: 84 (xml2map)
org.apache.harmony.xml.ExpatParser$ParseException: At line 5, column 0: undefined entity
at org.apache.harmony.xml.ExpatParser.parseFragment(ExpatParser.java:515)
at org.apache.harmony.xml.ExpatParser.parseDocument(ExpatParser.java:474)
at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:316)
at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:279)
at anywheresoftware.b4a.objects.SaxParser.parse(SaxParser.java:80)
at anywheresoftware.b4a.objects.SaxParser.Parse(SaxParser.java:73)
at anywheresoftware.b4a.samples.xmlsax.xml2map._parse2(xml2map.java:252)
at anywheresoftware.b4a.samples.xmlsax.xml2map._parse(xml2map.java:90)
at java.lang.reflect.Method.invoke(Native Method)
at java.lang.reflect.Method.invoke(Method.java:372)
at anywheresoftware.b4a.shell.Shell.runMethod(Shell.java:708)
at anywheresoftware.b4a.shell.Shell.raiseEventImpl(Shell.java:340)
at anywheresoftware.b4a.shell.Shell.raiseEvent(Shell.java:247)
at java.lang.reflect.Method.invoke(Native Method)
at java.lang.reflect.Method.invoke(Method.java:372)
at anywheresoftware.b4a.ShellBA.raiseEvent2(ShellBA.java:134)
at anywheresoftware.b4a.samples.xmlsax.main.afterFirstLayout(main.java:102)
at anywheresoftware.b4a.samples.xmlsax.main.access$000(main.java:17)
at anywheresoftware.b4a.samples.xmlsax.main$WaitForLayout.run(main.java:80)
at android.os.Handler.handleCallback(Handler.java:815)
at android.os.Handler.dispatchMessage(Handler.java:104)
at android.os.Looper.loop(Looper.java:194)
at android.app.ActivityThread.main(ActivityThread.java:5651)
at java.lang.reflect.Method.invoke(Native Method)
at java.lang.reflect.Method.invoke(Method.java:372)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:959)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:754)
 

Attachments

  • format.xml
    40 KB · Views: 748
Status
Not open for further replies.
Top