Android Question XML, HTML, XHTML...

stanks

Active Member
Licensed User
Longtime User
hi

i am trying to parse one file from internet. i am not sure is it xml, html or xhtml file. how to know the diff? i have never made any page so i don't know the diff? that file i am trying to parse looks like this:

B4X:
<div id="XYZ">
<ul class="tabs">
   
          <li class="active" id="li_jedan">
        <p class="first">
            jedan</p>
       
    </li>
   
          <li id="li_dva">
        <p>
            dva</p>
       
    </li>
   
          <li id="li_tri">
        <p>
            tri</p>
       
    </li>
   
          <li id="li_cetiri">
        <p>
            cetiri</p>
       
    </li>
   
          <li id="li_pet">
        <p>
            pet</p>
       
    </li>
   
          <li  id="li_sest">
        <p class="last">
            sest</p>
       
    </li>
   
</ul>

<div id ="XYZ_1">
               
                <div id="div_jedan">
                <table class="nowrapper fuel_segmented">
                <thead>
                    <tr>
                        <th>
                            test1
                        </th>
                        <th>
                            test2
                        </th>
                    </tr>
                </thead>
                <tbody>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">x1</span></br>a1</td>
                        <td class="fuel_segmented">10,41</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">x1</span></br>a2</td>
                        <td class="fuel_segmented">10,51</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">y1</span></br>a1</td>
                        <td class="fuel_segmented">10,41</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">z1</span></br>a2</td>
                        <td class="fuel_segmented">10,41</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">z1</span></br>a3</td>
                        <td class="fuel_segmented">10,51</td>
                    </tr>
                   
                </tbody>
            </table>
                </div>
           
                <div id="div_dva">
                <table class="nowrapper fuel_segmented">
                <thead>
                    <tr>
                        <th>
                            test1
                        </th>
                        <th>
                            test2
                        </th>
                    </tr>
                </thead>
                <tbody>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">x1</span></br>a1</td>
                        <td class="fuel_segmented">9,90</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">x1</span></br>a2</td>
                        <td class="fuel_segmented">9,78</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">y1</span></br>a2</td>
                        <td class="fuel_segmented">9,78</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">y1</span></br>a3</td>
                        <td class="fuel_segmented">9,88</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">z1</span></br>a4</td>
                        <td class="fuel_segmented">9,78</td>
                    </tr>
                   
                    <tr>
                        <td class="fuel_name"><span class="vendorName">z1</span></br>a5</td>
                        <td class="fuel_segmented">9,88</td>
                    </tr>
                   
                </tbody>
            </table>
                </div>
           
                <div id="div_tri">
...
...
...
etc....code continues with similar info. what is element here/node/everything else? i tried xmlsax lib but fails every time. i need to get info from <ul class...> </ul> (and li in side it), div_jedan, div_dva, etc. and everything inside it (i think that this is everything inside <table></table> tags.
any help? at least how to start and from where.

thanks
 

DonManfred

Expert
Licensed User
Longtime User
The code you posted is HTML. It is nearly inpossible to use an xml-parser for this.
HTML isnt really parseable. You need a DOM-Parser for such things (B4A does not have one). BUT b4a can display a htmlpage, inject javascript to it and with the javascript library JQuery you can get such infos from the htmlpage easy.

The only possibility is to use a regex-Pattern which finds one or more <ul class...></ul>

And if the page always have the same htmlstructure you can search you also can use stringfunctions to split the html into parts, split the parts to subpart and so on... It is possible to get the info you need with stringfunctions too... But that´s not elegant and a lot of work to write alls the if thens....

JQuery (javascript) for example should be the best way to parse a html-page i think.
 
Upvote 0

eps

Expert
Licensed User
Longtime User
You can use regex and so on to parse this yourself, no need to learn javascript. You can effectively download the page into a string and parse it yourself then. Does the web page information change in format? If you're only interested in information between certain tags, search for those, trim out the text and use it. It's really not that difficult.
 
Upvote 0

RandomCoder

Well-Known Member
Licensed User
Longtime User
....You need a DOM-Parser for such things (B4A does not have one).
I've recently been using the XOM library which user warwound kindly made available. It's based on DOM methods and so maybe could be of use? I'm not sure if it will create the XOM document however as I think that it still needs to see a correctly formatted XML file, although it appears as though warwound created it for another forum user...
I am not actively developing this library, it's an old project that I have uploaded to enable another forum member to extract data from an HTML webpage.

Good luck,
RandomCoder
 
Upvote 0

JoeR

Member
Licensed User
Longtime User
I was curious when I read your post, so I searched for a PC-based solution. The following company offers a combination of free and low-cost software.
I am not connected with them, and know nothing about their software.

It might be worthwhile having a look.

http://www.dataparse.com/default.aspx
 
Upvote 0

Erel

B4X founder
Staff member
Licensed User
Longtime User
HTML isnt really parseable. You need a DOM-Parser for such things.
You are confusing several terms. DOM and SAX parsers can parse XML. DOM parsers are not more powerful than SAX parsers.

You need to use jTidy library. it will convert the HTML / XHTML to a proper XML string. You can then use whichever XML parser you like to parse it.
 
  • Like
Reactions: eps
Upvote 0
Top