Android Question Getting the source code of a webpage with WebView and WebViewExtras

adrianstanescu85

Active Member
Licensed User
Longtime User
Hello,

I need to do a pretty simple thing, which is exact the source code of a webpage, i.e. google.com or whatever, so that I can parse it later on.

I added a WebView to my app and then using the WebViewExtras lib I tried the following:

Sub WebView1_PageFinished (Url As String)
' Now that the web page has loaded we can get the page content as a String
Dim JS1 As String
JS1 = "B4A.CallSub('ProcessHTML', true ,document.documentElement.outerHTML)"
Log("PageFinished: " & JS1)
MyWebViewExtras.executeJavascript(WebView1, JS1)
End Sub

Sub ProcessHTML(Html As String)
' This is the Sub that we'll get the web page to send it's HTML content to

' Log may truncate a large page so you'll not see all of the HTML in the log but the 'html' String should still contain all of the web page HTML

Log("ProcessHTML: " & Html)
End Sub

So far, the ONLY thing I get in my log is:

** Activity (main) Pause, UserClosed = false **
** Activity (main) Create, isFirst = true **
** Activity (main) Resume **
PageFinished: B4A.CallSub('ProcessHTML',true,document.documentElement.innerHTML)

So... it stops at that, no source code... Do you have any suggestions?

Thank you!
Adrian
 

adrianstanescu85

Active Member
Licensed User
Longtime User
Martin,

I added the WebViewExtras lib, as far as I know that lib replaces an old one... Do I need to add something different? I do have to say the compiler doesn't output any error...

Adrian
 
Upvote 0

adrianstanescu85

Active Member
Licensed User
Longtime User
Martin,

Thank you for the reply, that was the exact example I was using before, and still... the log doesn't show any source code. To be more precise I downloaded the example already built that I found there and ran it.. same thing. Then I added a Log after the execution of the javascript, that is logged... but the ProcessHTML sub never gets launched! I have an extra log there that never fires. Any suggestions?

I'm using the 1.40 version of WebViewExtras which you posted at http://www.b4x.com/android/forum/attachments/webviewextras_v1_40-zip.18329/ and I hope it's the right one. I added the contents to the libs (extra libs actually) folder where I put all the other libs for B4A as I use them.

Is the version wrong or something doesn't work from somewhere else?

Thank you!
Adrian
 
Upvote 0

warwound

Expert
Licensed User
Longtime User
Hmmm....

I just downloaded the 'SaveHTML' example from the above link and compiled it using WebViewExtras 1.40, it works as expected!
I did add a WebChromeClient to the WebView, now any browser console messages (such as errors) will be output to the android log:

B4X:
Sub Activity_Create(FirstTime As Boolean)
   Activity.LoadLayout("layoutMain")
   
   '   add the B4A javascript interface to the WebView
   WebViewExtras1.addJavascriptInterface(WebView1, "B4A")
   
   '   adding a WebChromeClient will log all browser console message to the android log
   '   so any webpage or javascript errors will be logged
   WebViewExtras1.addWebChromeClient(WebView1, "")
   
   '   now load a web page
   WebView1.LoadUrl("http://www.b4x.com/android/forum/threads/getting-the-source-code-of-a-webpage-with-webview-and-webviewextras.34418/#post-202076")
End Sub

This is the log output:

LogCat connected to: HT19MTJ01204
--------- beginning of /dev/log/system
--------- beginning of /dev/log/main
** Activity (main) Create, isFirst = true **
** Activity (main) Resume **
XenForo.init() %dms. jQuery %s/%s in http://www.b4x.com/android/forum/js/xenforo/xenforo.js?_v=28d42049 (Line: 191)
PageFinished: B4A.CallSub('ProcessHTML', false, document.documentElement.outerHTML)
Invalid App Id: Must be a number or numeric string representing the application id. in http://connect.facebook.net/en_US/all.js (Line: 56)
FB.getLoginStatus() called before calling FB.init(). in http://connect.facebook.net/en_US/all.js (Line: 56)
FB.getLoginStatus() called before calling FB.init(). in http://connect.facebook.net/en_US/all.js (Line: 56)
ProcessHTML: <html id="XenForo" lang="en-US" dir="LTR" class="Public LoggedOut NoSidebar Responsive hasJs Touch" xmlns:fb="http://www.facebook.com/2008/fbml"><head>

<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1">

<meta name="viewport" content="width=device-width, initial-scale=1">


<base href="http://www.b4x.com/android/forum/">
<script async="" src="http://www.google-analytics.com/ga.js"></script><script>
var _b = document.getElementsByTagName('base')[0], _bH = "http://www.b4x.com/android/forum/";
if (_b && _b.href != _bH) _b.href = _bH;
</script>


<title>Question - Getting the source code of a webpage with WebView and WebViewExtras | Basic4android Community</title>


<link rel="stylesheet" href="css.php?css=xenforo,form,public&amp;style=1&amp;dir=LTR&amp;d=1383667135">
<link rel="stylesheet" href="css.php?css=bb_code,login_bar,message,message_user_info,panel_scroller,share_page,thread_view&amp;style=1&amp;dir=LTR&amp;d=1383667135">

Message longer than Log limit (4000). Message was truncated.

You can see a few console messages then ProcessHTML is called and there's the webpage HTML.
(I only pasted some of the log as the forum didn't want all 4000+ characters).

This is on an old HTC Desire S running a custom Jelly Bean 4.2.2 android, but there's nothing in the code that means it shouldn't work on any version of android.
(Version 1.40+ of WebViewExtras is required in order for the JavascriptInterface to work on android versions 4.2+).
What device are you trying to run this code on?
If you're using an emulator then try a real device - emulators can have various quirks that prevent straightforward code from working as expected.

My updated SaveHTML project is attached.

Martin.
 

Attachments

  • SaveHtml-20131114.zip
    6.8 KB · Views: 1,662
Upvote 0

adrianstanescu85

Active Member
Licensed User
Longtime User
Martin,

I'm using a real device, an LG 5 II (model E455). I switched to your latest example above, here is the complete log:

** Activity (main) Create, isFirst = true **
** Activity (main) Resume **
XenForo.SquareThumbs: %o in http://www.b4x.com/android/forum/js/xenforo/xenforo.js?_v=28d42049 (Line: 358)
XenForo.init() %dms. jQuery %s/%s in http://www.b4x.com/android/forum/js/xenforo/xenforo.js?_v=28d42049 (Line: 191)
Invalid App Id: Must be a number or numeric string representing the application id. in http://connect.facebook.net/en_US/all.js (Line: 56)
FB.getLoginStatus() called before calling FB.init(). in http://connect.facebook.net/en_US/all.js (Line: 56)
FB.getLoginStatus() called before calling FB.init(). in http://connect.facebook.net/en_US/all.js (Line: 56)
PageFinished: B4A.CallSub('ProcessHTML', false, document.documentElement.outerHTML)

I can't really make a sense of this, however the log is missing. I'm thinking whether this is a problem I may be having with B4A itself?

Adrian
 
Upvote 0

warwound

Expert
Licensed User
Longtime User
Which version of Basic4Android are you using? The latest?
Have you got any other devices you can try the code on?

Martin.
 
Upvote 0

warwound

Expert
Licensed User
Longtime User
I'd rarely recommend anyone to use an emulator but if you have time you could try an emulator.
Help establish whether this is a problem with your device or the older version of B4A.

An idea - take my previously posted SaveHTML-20131114.zip project and add a line ito execute some different javascript:

B4X:
Sub WebView1_PageFinished (Url As String)
   '   Now that the web page has loaded we can get the page content as a String
   
   '   see the documentation http://www.b4x.com/forum/additional-libraries-classes-official-updates/12453-webviewextras.html#post70053 for details of the second parameter callUIThread
   
   '    a simple test
   WebViewExtras1.executeJavascript(WebView1, "alert('Hello World')")
   
   Dim Javascript As String
   Javascript="B4A.CallSub('ProcessHTML', false, document.documentElement.outerHTML)"
   
   Log("PageFinished: "&Javascript)
   WebViewExtras1.executeJavascript(WebView1, Javascript)
End Sub

Do you see an 'alert' message box?

Martin.
 
Upvote 0

adrianstanescu85

Active Member
Licensed User
Longtime User
Martin,

Yes, the alert message appears, the log stops at that point and carries on after I click the OK button. Still no code though... It looks like the ProcessHTML event never fires.

Adrian
 
Upvote 0

warwound

Expert
Licensed User
Longtime User
That's strange!

We've established that the WebView javascript is enabled that the JavascriptInterface is working.

I'll be thinking...

Martin.
 
Upvote 0

warwound

Expert
Licensed User
Longtime User
Aha!

Try the non obfuscated Release mode - i bet the event name is getting obfuscated in the compilation process.
There should be a text file in your project's Objects folder that lists all text that has been obfuscated - i bet the text "ProcessHTML" is listed there.

Martin.
 
Upvote 0

adrianstanescu85

Active Member
Licensed User
Longtime User
Touche! That was the problem! And yes, the Obfuscator txt file contained the event! How do I keep the obfuscator on and exclude such text from that?
 
Upvote 0

warwound

Expert
Licensed User
Longtime User
The obfuscator will not obfuscate a Sub name if the Sub name contains an underscore.
So this works for me:

B4X:
Sub Process_Globals
End Sub

Sub Globals
   Dim WebViewExtras1 As WebViewExtras
   Dim WebView1 As WebView
End Sub

Sub Activity_Create(FirstTime As Boolean)
   Activity.LoadLayout("layoutMain")
   
   '   add the B4A javascript interface to the WebView
   WebViewExtras1.addJavascriptInterface(WebView1, "B4A")
   
   '   adding a WebChromeClient will log all browser console message to the android log
   '   so any webpage or javascript errors will be logged
   WebViewExtras1.addWebChromeClient(WebView1, "")
   
   '   now load a web page
   WebView1.LoadUrl("http://www.b4x.com/android/forum/threads/getting-the-source-code-of-a-webpage-with-webview-and-webviewextras.34418/#post-202076")
End Sub

Sub Activity_Resume
End Sub

Sub Activity_Pause (UserClosed As Boolean)
End Sub

Sub WebView1_PageFinished (Url As String)
   '   Now that the web page has loaded we can get the page content as a String
   
   '   see the documentation http://www.b4x.com/forum/additional-libraries-classes-official-updates/12453-webviewextras.html#post70053 for details of the second parameter callUIThread
   
   Dim Javascript As String
   Javascript="B4A.CallSub('Process_HTML', false, document.documentElement.outerHTML)"
   

   Log("PageFinished: "&Javascript)
   WebViewExtras1.executeJavascript(WebView1, Javascript)
End Sub

Sub Process_HTML(Html As String)
   '   This is the Sub that we'll get the web page to send it's HTML content to
   
   '   Log may truncate a large page so you'll not see all of the HTML in the log but the 'html' String should still contain all of the web page HTML
   
   Log("Process_HTML: "&Html)
End Sub

Martin.
 
Upvote 0
Top