Android Question XML, HTML, XHTML...

stanks · Feb 28, 2014

hi

i am trying to parse one file from internet. i am not sure is it xml, html or xhtml file. how to know the diff? i have never made any page so i don't know the diff? that file i am trying to parse looks like this:

B4X:

<div id="XYZ">
<ul class="tabs">
   
          <li class="active" id="li_jedan">
        <p class="first">
            jedan</p>
       
    </li>
   
          <li id="li_dva">
        <p>
            dva</p>
       
    </li>
   
          <li id="li_tri">
        <p>
            tri</p>
       
    </li>
   
          <li id="li_cetiri">
        <p>
            cetiri</p>
       
    </li>
   
          <li id="li_pet">
        <p>
            pet</p>
       
    </li>
   
          <li  id="li_sest">
        <p class="last">
            sest</p>
       
    </li>
   
</ul>

<div id ="XYZ_1">
               
                <div id="div_jedan">
                <table class="nowrapper fuel_segmented">
                <thead>
                    <tr>
                        <th>
                            test1
                        </th>
                        <th>
                            test2
                        </th>
                    </tr>
                </thead>
                <tbody>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">x1</span></br>a1</td>
                        <td class="fuel_segmented">10,41</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">x1</span></br>a2</td>
                        <td class="fuel_segmented">10,51</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">y1</span></br>a1</td>
                        <td class="fuel_segmented">10,41</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">z1</span></br>a2</td>
                        <td class="fuel_segmented">10,41</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">z1</span></br>a3</td>
                        <td class="fuel_segmented">10,51</td>
                    </tr>
                   
                </tbody>
            </table>
                </div>
           
                <div id="div_dva">
                <table class="nowrapper fuel_segmented">
                <thead>
                    <tr>
                        <th>
                            test1
                        </th>
                        <th>
                            test2
                        </th>
                    </tr>
                </thead>
                <tbody>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">x1</span></br>a1</td>
                        <td class="fuel_segmented">9,90</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">x1</span></br>a2</td>
                        <td class="fuel_segmented">9,78</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">y1</span></br>a2</td>
                        <td class="fuel_segmented">9,78</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">y1</span></br>a3</td>
                        <td class="fuel_segmented">9,88</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">z1</span></br>a4</td>
                        <td class="fuel_segmented">9,78</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">z1</span></br>a5</td>
                        <td class="fuel_segmented">9,88</td>
                    </tr>
                   
                </tbody>
            </table>
                </div>
           
                <div id="div_tri">
...
...
...

etc....code continues with similar info. what is element here/node/everything else? i tried xmlsax lib but fails every time. i need to get info from <ul class...> </ul> (and li in side it), div_jedan, div_dva, etc. and everything inside it (i think that this is everything inside <table></table> tags.
any help? at least how to start and from where.

thanks

DonManfred · Feb 28, 2014

The code you posted is HTML. It is nearly inpossible to use an xml-parser for this.
HTML isnt really parseable. You need a DOM-Parser for such things (B4A does not have one). BUT b4a can display a htmlpage, inject javascript to it and with the javascript library JQuery you can get such infos from the htmlpage easy.

The only possibility is to use a regex-Pattern which finds one or more <ul class...></ul>

And if the page always have the same htmlstructure you can search you also can use stringfunctions to split the html into parts, split the parts to subpart and so on... It is possible to get the info you need with stringfunctions too... But that´s not elegant and a lot of work to write alls the if thens....

JQuery (javascript) for example should be the best way to parse a html-page i think.

stanks · Feb 28, 2014

thx...nice....just need to learn javascript

eps · Feb 28, 2014

You can use regex and so on to parse this yourself, no need to learn javascript. You can effectively download the page into a string and parse it yourself then. Does the web page information change in format? If you're only interested in information between certain tags, search for those, trim out the text and use it. It's really not that difficult.

RandomCoder · Mar 1, 2014

DonManfred said:
....You need a DOM-Parser for such things (B4A does not have one).

I've recently been using the XOM library which user warwound kindly made available. It's based on DOM methods and so maybe could be of use? I'm not sure if it will create the XOM document however as I think that it still needs to see a correctly formatted XML file, although it appears as though warwound created it for another forum user...

warwound said:
I am not actively developing this library, it's an old project that I have uploaded to enable another forum member to extract data from an HTML webpage.

Good luck,
RandomCoder

DonManfred · Mar 1, 2014

i did not recognized XOM library till now... Yes, seems to be something of use. The sample looks like something you need... getting the ULs from a webpage.

JoeR · Mar 1, 2014

I was curious when I read your post, so I searched for a PC-based solution. The following company offers a combination of free and low-cost software.
I am not connected with them, and know nothing about their software.

It might be worthwhile having a look.

http://www.dataparse.com/default.aspx

Erel · Mar 2, 2014

HTML isnt really parseable. You need a DOM-Parser for such things.

You are confusing several terms. DOM and SAX parsers can parse XML. DOM parsers are not more powerful than SAX parsers.

You need to use jTidy library. it will convert the HTML / XHTML to a proper XML string. You can then use whichever XML parser you like to parse it.

stanks · Mar 3, 2014

thanks for answers guys

Android Question XML, HTML, XHTML...

stanks

Active Member

DonManfred

Expert

stanks

Active Member

eps

Expert

RandomCoder

Well-Known Member

DonManfred

Expert

JoeR

Member

Erel

B4X founder

stanks

Active Member

Similar Threads