HTML table parsing - need demo

jalle007

Active Member
Licensed User
Longtime User
Here I have simple HTML table which I need get data from
<table style="width:100%; font-size:11px">
<tr>
<td colspan="2">
<strong>Beograd</strong>
</td>
</tr>
<tr>
<td>
<span>Broj leta</span>
</td>
<td><strong>JU 109, JA 1376</strong></td>
</tr>
<tr>
<td>
<span>Avio-kompanija</span>
</td>
<td><strong>JAT AIRWAYS</strong></td>
</tr>
<tr>
<td>
<span>Tip aviona</span>
</td>
<td>AT72</td>
</tr>
<tr>
<td>
<span>Planirano vrijeme</span>
</td>
<td>06:20</td>
</tr>


<tr>
<td>Status leta</td>
<td style="font-weight:bold">
Odletio
</td>
</tr>

<tr>
<td colspan="2"><hr></td>
</tr>
</table>
All the data is in TR's (row)
Is there a sample of maybe library which can help with this ?
 

Cableguy

Expert
Licensed User
Longtime User
Yes... you retrieve the HTML document and then you can parse it
 
Upvote 0

warwound

Expert
Licensed User
Longtime User
If the HTML is well formed - contains no errors - then it is an XML document.
And you can parse it as you'd parse other XML document to get the values you require.

There's the 'standard' Xmlsax library and there's also my XOM library.

Martin.
 
Upvote 0

jalle007

Active Member
Licensed User
Longtime User
Hmm thx for the info

When I try to parse this page with SaxParser it doesnt lead me to the Table node ( sample project attached )


In regards to XOM library I tried
and it gives me the error
A referenced library is missing: xom-1.2.8
which I am not able to find.

What am I doing so wrong here ?:sign0161:
 

Attachments

  • Xml2.zip
    378.2 KB · Views: 319
Upvote 0

warwound

Expert
Licensed User
Longtime User
XOM requires two additional files that were too big to include in the forum attachment:

Use of XOM requires two additional jar files to be added to your B4A additional libraries folder: xom-1.2.8.jar and dtd-xercesImpl.jar, the forum attachment size limit prevents me from attaching these two files to this post so i have made them available from here.

Download those files and put them in your b4a additional libraries folder and try XOM again.

Martin.
 
Upvote 0

warwound

Expert
Licensed User
Longtime User
I just tried the project that you attached to your last post and get an exception - SaxParser can't parse the HTML because the HTML is not valid.

XOM fails too as the document is not well formed, i used this code to test it:

B4X:
Sub Process_Globals
   'These global variables will be declared once when the application starts.
   'These variables can be accessed from all modules.

End Sub

Sub Globals
   'These global variables will be redeclared each time the activity is created.
   'These variables can only be accessed from this module.

End Sub

Sub Activity_Create(FirstTime As Boolean)

   Dim XOMBuilder1 As XOMBuilder
   XOMBuilder1.Initialize("XOMBuilder1")
   
   Dim XmlString As String=File.GetText(File.DirAssets, "letovi.php.htm")
   XOMBuilder1.BuildFromString(XmlString, "", Null)

End Sub

Sub Activity_Resume

End Sub

Sub Activity_Pause (UserClosed As Boolean)

End Sub

Sub XOMBuilder1_BuildDone(XOMDocument1 As XOMDocument, Tag As Object)
   If XOMDocument1=Null Then
      '   XOMDocument1 will be Null if an error has occurred
      Log("An error has occured and the XOMDocument has NOT been created")
   Else
      Log("XOMDocument is NOT Null")
      Dim RootElement As XOMElement
      RootElement=XOMDocument1.RootElement
      
   End If
End Sub

It logs An error has occured and the XOMDocument has NOT been created.

Have a read of this post: http://www.b4x.com/forum/basic4andr.../25274-parsing-html-page-help.html#post146819.
Using a PHP proxy script to fetch and fix the original HTML and then return the fixed HTML to your device is a possible solution but also a lot of work.

You've just posted a reply while i'm typing this so have found that XOM doesn't work!
So i think you have two solutions:

Load the HTML into a b4a String and use regular expressions to try and extract the data - a lot of work and prone to fail depending on the poorly written HTML.

Load the HTML into a WebView and then use WebViewExtras to inject javascript into the web page to extract the data you require.

I'll have a look at this shortly and try to post some example code.

Martin.
 
Upvote 0

warwound

Expert
Licensed User
Longtime User
Here we have a possible solution.

B4X:
Sub Process_Globals

End Sub

Sub Globals
   Dim GoButton As Button
   Dim WebView1 As WebView
   Dim WebViewExtras1 As WebViewExtras
End Sub

Sub Activity_Create(FirstTime As Boolean)
   Activity.LoadLayout("Main")
   
   WebViewExtras1.addJavascriptInterface(WebView1, "B4A")
   WebViewExtras1.addWebChromeClient(WebView1, "WebViewExtras1")
   
   WebView1.LoadUrl("file:///android_asset/letovi.php.htm")
End Sub

Sub Activity_Resume

End Sub

Sub Activity_Pause (UserClosed As Boolean)

End Sub

Sub GoButton_Click
    Dim Javascript As String
    Javascript="B4A.CallSub('ParseHtml', true, document.getElementById('desnomenu').innerHTML)"
   WebViewExtras1.executeJavascript(WebView1, Javascript)
End Sub

Sub ParseHtml(Html As String)
   Log(Html)
   Dim XOMBuilder1 As XOMBuilder
   XOMBuilder1.Initialize("XOMBuilder1")
   XOMBuilder1.BuildFromString(Html, "", Null)
End Sub

Sub WebView1_PageFinished (Url As String)
   Log("WebView1_PageFinished: "&Url)
   GoButton.Enabled=True
End Sub

Sub XOMBuilder1_BuildDone(XOMDocument1 As XOMDocument, Tag As Object)
   If XOMDocument1=Null Then
      '   XOMDocument1 will be Null if an error has occurred
      Log("An error has occured and the XOMDocument has NOT been created")
   Else
      Log("XOMDocument is NOT Null")
      Dim RootElement As XOMElement
      RootElement=XOMDocument1.RootElement
      
   End If
End Sub

The webpage is loaded into a WebView - the WebView silently ignores errors in the HTML and renders the page.

Once the page has loaded you can click the Go button, which injects some javascript into the web page.
The javascript gets the (valid) contents of the DIV which has an id of 'desnomenu' and sends that content to a b4a Sub named 'ParseHTML'.

Sub ParseHtml then uses XOM to parse the String that the javascript has sent to it. The log shows 'XOMDocument is NOT Null' so i'm hoping that from here you can use XOM to get the data you require.

It's a bit of a convoluted solution but is worth trying.

Martin.
 

Attachments

  • WebViewGetHtml.zip
    8.6 KB · Views: 321
Upvote 0

jalle007

Active Member
Licensed User
Longtime User
many thx warwound

this code returns table

now i twrote Jquery script which returns array of cities (thats the only important info I need to have)

HTML:
<script src="http://code.jquery.com/jquery-1.9.1.min.js"></script>
<script type="text/javascript" >
    $(document).ready(function () {
        
        var cities = [];
        var tds = $('#desnomenu td a').each(function (index, elem) {
            cities[index] = elem.innerHTML;
        });  
        alert(cities);
    });
    
      
</script>>

now another questions. how can I executre using WebViewExtras ?
 
Upvote 0

jalle007

Active Member
Licensed User
Longtime User
in you example whole table HTML is returned.

I tried with this code

which returns only cities (contents of table rows)
but then I got error:

B4X:
Sub GoButton_Click
    Dim Javascript As String
      
   Javascript="B4A.CallSub('ParseHtml', true, var mybody = document.getElementsByTagName('body')[0];var mytable = mybody.getElementsByTagName('table')[0];var mytablebody = mytable.getElementsByTagName('tbody')[0];var cities = [];for (var n = 0; n < mytablebody.rows.length;n++) {var myrow = mytablebody.getElementsByTagName('tr')[n];cities[n] = myrow.getElementsByTagName('a')[0].innerHTML;} return cities;)"
   WebViewExtras1.executeJavascript(WebView1, Javascript)
End Sub


B4X:
Uncaught SyntaxError: Unexpected token var in null (Line: 1)
 
Upvote 0

warwound

Expert
Licensed User
Longtime User
After a bit of trial and error i found this works:

B4X:
Sub Process_Globals

End Sub

Sub Globals
   Dim WebView1 As WebView
   Dim WebViewExtras1 As WebViewExtras
End Sub

Sub Activity_Create(FirstTime As Boolean)
   Activity.LoadLayout("Main")
   
   WebViewExtras1.addJavascriptInterface(WebView1, "B4A")
   
   WebView1.LoadUrl("file:///android_asset/letovi.php.htm")
End Sub

Sub Activity_Resume

End Sub

Sub Activity_Pause (UserClosed As Boolean)

End Sub

Sub DoSomething(JSONString As String)
   Log(JSONString)
   '   now use the JSON library to turn JSONString into an Array or List etc so you can process it
End Sub

Sub WebView1_PageFinished (Url As String)
   Log("WebView1_PageFinished: "&Url)
   
   AddJQuery
   
End Sub

Sub AddJQuery
   Dim Javascript As StringBuilder
   Javascript.Initialize
   Javascript.Append("var scriptTag=document.createElement('script');")
   Javascript.Append("scriptTag.setAttribute('type','text/javascript');")
   Javascript.Append("scriptTag.setAttribute('src', 'http://code.jquery.com/jquery-1.9.1.min.js');")
   
   Javascript.Append("scriptTag.setAttribute('onload',"&QUOTE)
   
   Javascript.Append("var cities=[];")
   Javascript.Append("var tds=$('#desnomenu td a').each(function(index, elem){cities[index]=elem.innerHTML;});")
   Javascript.Append("B4A.CallSub('DoSomething', true, JSON.stringify(cities));")
   Javascript.Append("")
   Javascript.Append(QUOTE&");")
   
   Javascript.Append("document.getElementsByTagName('head')[0].appendChild(scriptTag);")
   
   WebViewExtras1.executeJavascript(WebView1, Javascript.ToString)
End Sub

Once the webpage has loaded some javascript is created and injected into the page.
The javascript adds the jquery script to the webpage and runs your jquery code.
It then returns the result of the jquery code to a b4a Sub 'DoSomething':

B4X:
["Ancona","Beograd","Copenhagen","Istanbul","Ljubljana","Munich","Vienna","Zagreb"]

You can then use the JSON library to work with the JSON String.

Martin.
 

Attachments

  • WebViewGetCities.zip
    8.4 KB · Views: 327
Upvote 0
Top