Android Question Extract HTML elements from an HTML source code

sirjo66

Well-Known Member
Licensed User
Longtime User
Hi all :)

I need to automate the navigation into some web sites and I am using B4A 9.80 with WebView and WebViewExtras2.

When I need to interact with web page I use JavaScript, for example:
B4X:
wve.ExecuteJavascript("document.getElementById('username').setAttribute('value','myUserName'); " & _
                "document.getElementById('password').setAttribute('value','myPassword!'); " & _
                "document.getElementById('btnSubmit').click();")
Wait For wv_PageFinished

and all work perfectly !!

But when I need to read the web page I have some problem.
If in the HTML source is used "id" tag, there are no problems, I use JavaScript code that read the correct element, and return to me its value.

But if the HTML source don't use "id" tag, I read all HTML codce with
B4X:
document.documentElement.outerHTML
and pass it to my app, but now I need to parse the HTML code for to find elements and extract the correct value.

For example, a <table> with <tr> and <td> inside it.

If it is an easy search, I use RegEx, but how can I do for to extract HTMLelement (or an array of HTMLelements) ??

In VB.NET now I use WebBrowser object and its DOM element, but how with B4A ??
Is there an HTML parse ??

Many thanks
Sergio
 

amidgeha

Active Member
Licensed User
Longtime User
Hi all :)

I need to automate the navigation into some web sites and I am using B4A 9.80 with WebView and WebViewExtras2.

When I need to interact with web page I use JavaScript, for example:
B4X:
wve.ExecuteJavascript("document.getElementById('username').setAttribute('value','myUserName'); " & _
                "document.getElementById('password').setAttribute('value','myPassword!'); " & _
                "document.getElementById('btnSubmit').click();")
Wait For wv_PageFinished

and all work perfectly !!

But when I need to read the web page I have some problem.
If in the HTML source is used "id" tag, there are no problems, I use JavaScript code that read the correct element, and return to me its value.

But if the HTML source don't use "id" tag, I read all HTML codce with
B4X:
document.documentElement.outerHTML
and pass it to my app, but now I need to parse the HTML code for to find elements and extract the correct value.

For example, a <table> with <tr> and <td> inside it.

If it is an easy search, I use RegEx, but how can I do for to extract HTMLelement (or an array of HTMLelements) ??

In VB.NET now I use WebBrowser object and its DOM element, but how with B4A ??
Is there an HTML parse ??

Many thanks
Sergio

Assume variable "s" is holding the HTML source:
Table data extract:
            s = $"<!DOCTYPE html>
<html>
<head>
    <title>Read Data from HTML Table uisng JavaScript</title>
    <style>
        th, td, p, input {
            font:14px Verdana;
        }
        table, th, td
        {
            border: solid 1px #DDD;
            border-collapse: collapse;
            padding: 2px 3px;
            text-align: center;
        }
        th {
            font-weight:bold;
        }
    </style>
</head>
<body>"$ & s & "</body>"
            s =  s & $" <script>
    function showTableData() {
        var mycells = '';
        var myTab = document.getElementById('main_table_countries_today');

        // LOOP THROUGH EACH ROW OF THE TABLE AFTER HEADER.
        for (i = 1; i < myTab.rows.length; i++) {

            // GET THE CELLS COLLECTION OF THE CURRENT ROW.
            var objCells = myTab.rows.item(i).cells;

            // LOOP THROUGH EACH CELL OF THE CURENT ROW TO READ CELL VALUES.
            for (var j = 0; j < objCells.length; j++) {
                if(j == 0) mycells = mycells + objCells.item(j).innerHTML;
                if(j > 0) mycells = mycells + '|' + objCells.item(j).innerHTML;
                
            }
            mycells = mycells + '<br />';     // ADD A BREAK (TAG).
        }
        //alert(mycells);
        B4A.CallSub('jProgress_a', true, mycells);  //this B4X sub process the data
    }
 
</script> </html>"$

            CallSubDelayed2(Main, "loadthe_Page", s)
 
Upvote 0

sirjo66

Well-Known Member
Licensed User
Longtime User
Yes amidgeha, where is possible I extract datas with JavaScript, no problem, but my question was for parse it in B4A language
 
Upvote 0
Top