Android Question Extract HTML elements from an HTML source code

sirjo66

Well-Known Member
Licensed User
Longtime User
Hi all :)

I need to automate the navigation into some web sites and I am using B4A 9.80 with WebView and WebViewExtras2.

When I need to interact with web page I use JavaScript, for example:
B4X:
wve.ExecuteJavascript("document.getElementById('username').setAttribute('value','myUserName'); " & _
                "document.getElementById('password').setAttribute('value','myPassword!'); " & _
                "document.getElementById('btnSubmit').click();")
Wait For wv_PageFinished

and all work perfectly !!

But when I need to read the web page I have some problem.
If in the HTML source is used "id" tag, there are no problems, I use JavaScript code that read the correct element, and return to me its value.

But if the HTML source don't use "id" tag, I read all HTML codce with
B4X:
document.documentElement.outerHTML
and pass it to my app, but now I need to parse the HTML code for to find elements and extract the correct value.

For example, a <table> with <tr> and <td> inside it.

If it is an easy search, I use RegEx, but how can I do for to extract HTMLelement (or an array of HTMLelements) ??

In VB.NET now I use WebBrowser object and its DOM element, but how with B4A ??
Is there an HTML parse ??

Many thanks
Sergio
 

amidgeha

Active Member
Licensed User
Longtime User

Assume variable "s" is holding the HTML source:
Table data extract:
            s = $"<!DOCTYPE html>
<html>
<head>
    <title>Read Data from HTML Table uisng JavaScript</title>
    <style>
        th, td, p, input {
            font:14px Verdana;
        }
        table, th, td
        {
            border: solid 1px #DDD;
            border-collapse: collapse;
            padding: 2px 3px;
            text-align: center;
        }
        th {
            font-weight:bold;
        }
    </style>
</head>
<body>"$ & s & "</body>"
            s =  s & $" <script>
    function showTableData() {
        var mycells = '';
        var myTab = document.getElementById('main_table_countries_today');

        // LOOP THROUGH EACH ROW OF THE TABLE AFTER HEADER.
        for (i = 1; i < myTab.rows.length; i++) {

            // GET THE CELLS COLLECTION OF THE CURRENT ROW.
            var objCells = myTab.rows.item(i).cells;

            // LOOP THROUGH EACH CELL OF THE CURENT ROW TO READ CELL VALUES.
            for (var j = 0; j < objCells.length; j++) {
                if(j == 0) mycells = mycells + objCells.item(j).innerHTML;
                if(j > 0) mycells = mycells + '|' + objCells.item(j).innerHTML;
                
            }
            mycells = mycells + '<br />';     // ADD A BREAK (TAG).
        }
        //alert(mycells);
        B4A.CallSub('jProgress_a', true, mycells);  //this B4X sub process the data
    }
 
</script> </html>"$

            CallSubDelayed2(Main, "loadthe_Page", s)
 
Upvote 0

sirjo66

Well-Known Member
Licensed User
Longtime User
Yes amidgeha, where is possible I extract datas with JavaScript, no problem, but my question was for parse it in B4A language
 
Upvote 0