Android Question Getting the source code of a webpage with WebView and WebViewExtras

adrianstanescu85

Active Member
Licensed User
Hello,

I need to do a pretty simple thing, which is exact the source code of a webpage, i.e. google.com or whatever, so that I can parse it later on.

I added a WebView to my app and then using the WebViewExtras lib I tried the following:

Sub WebView1_PageFinished (Url As String)
' Now that the web page has loaded we can get the page content as a String
Dim JS1 As String
JS1 = "B4A.CallSub('ProcessHTML', true ,document.documentElement.outerHTML)"
Log("PageFinished: " & JS1)
MyWebViewExtras.executeJavascript(WebView1, JS1)
End Sub

Sub ProcessHTML(Html As String)
' This is the Sub that we'll get the web page to send it's HTML content to

' Log may truncate a large page so you'll not see all of the HTML in the log but the 'html' String should still contain all of the web page HTML

Log("ProcessHTML: " & Html)
End Sub

So far, the ONLY thing I get in my log is:

** Activity (main) Pause, UserClosed = false **
** Activity (main) Create, isFirst = true **
** Activity (main) Resume **
PageFinished: B4A.CallSub('ProcessHTML',true,document.documentElement.innerHTML)

So... it stops at that, no source code... Do you have any suggestions?

Thank you!
Adrian
 

adrianstanescu85

Active Member
Licensed User
Martin,

I added the WebViewExtras lib, as far as I know that lib replaces an old one... Do I need to add something different? I do have to say the compiler doesn't output any error...

Adrian
 

adrianstanescu85

Active Member
Licensed User
Martin,

Thank you for the reply, that was the exact example I was using before, and still... the log doesn't show any source code. To be more precise I downloaded the example already built that I found there and ran it.. same thing. Then I added a Log after the execution of the javascript, that is logged... but the ProcessHTML sub never gets launched! I have an extra log there that never fires. Any suggestions?

I'm using the 1.40 version of WebViewExtras which you posted at http://www.basic4ppc.com/android/forum/attachments/webviewextras_v1_40-zip.18329/ and I hope it's the right one. I added the contents to the libs (extra libs actually) folder where I put all the other libs for B4A as I use them.

Is the version wrong or something doesn't work from somewhere else?

Thank you!
Adrian
 

warwound

Expert
Licensed User
Hmmm....

I just downloaded the 'SaveHTML' example from the above link and compiled it using WebViewExtras 1.40, it works as expected!
I did add a WebChromeClient to the WebView, now any browser console messages (such as errors) will be output to the android log:

B4X:
Sub Activity_Create(FirstTime As Boolean)
   Activity.LoadLayout("layoutMain")
   
   '   add the B4A javascript interface to the WebView
   WebViewExtras1.addJavascriptInterface(WebView1, "B4A")
   
   '   adding a WebChromeClient will log all browser console message to the android log
   '   so any webpage or javascript errors will be logged
   WebViewExtras1.addWebChromeClient(WebView1, "")
   
   '   now load a web page
   WebView1.LoadUrl("http://www.basic4ppc.com/android/forum/threads/getting-the-source-code-of-a-webpage-with-webview-and-webviewextras.34418/#post-202076")
End Sub
This is the log output:

LogCat connected to: HT19MTJ01204
--------- beginning of /dev/log/system
--------- beginning of /dev/log/main
** Activity (main) Create, isFirst = true **
** Activity (main) Resume **
XenForo.init() %dms. jQuery %s/%s in http://www.basic4ppc.com/android/forum/js/xenforo/xenforo.js?_v=28d42049 (Line: 191)
PageFinished: B4A.CallSub('ProcessHTML', false, document.documentElement.outerHTML)
Invalid App Id: Must be a number or numeric string representing the application id. in http://connect.facebook.net/en_US/all.js (Line: 56)
FB.getLoginStatus() called before calling FB.init(). in http://connect.facebook.net/en_US/all.js (Line: 56)
FB.getLoginStatus() called before calling FB.init(). in http://connect.facebook.net/en_US/all.js (Line: 56)
ProcessHTML: <html id="XenForo" lang="en-US" dir="LTR" class="Public LoggedOut NoSidebar Responsive hasJs Touch" xmlns:fb="http://www.facebook.com/2008/fbml"><head>

<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1">

<meta name="viewport" content="width=device-width, initial-scale=1">


<base href="http://www.basic4ppc.com/android/forum/">
<script async="" src="http://www.google-analytics.com/ga.js"></script><script>
var _b = document.getElementsByTagName('base')[0], _bH = "http://www.basic4ppc.com/android/forum/";
if (_b && _b.href != _bH) _b.href = _bH;
</script>


<title>Question - Getting the source code of a webpage with WebView and WebViewExtras | Basic4android Community</title>


<link rel="stylesheet" href="css.php?css=xenforo,form,public&amp;style=1&amp;dir=LTR&amp;d=1383667135">
<link rel="stylesheet" href="css.php?css=bb_code,login_bar,message,message_user_info,panel_scroller,share_page,thread_view&amp;style=1&amp;dir=LTR&amp;d=1383667135">

Message longer than Log limit (4000). Message was truncated.
You can see a few console messages then ProcessHTML is called and there's the webpage HTML.
(I only pasted some of the log as the forum didn't want all 4000+ characters).

This is on an old HTC Desire S running a custom Jelly Bean 4.2.2 android, but there's nothing in the code that means it shouldn't work on any version of android.
(Version 1.40+ of WebViewExtras is required in order for the JavascriptInterface to work on android versions 4.2+).
What device are you trying to run this code on?
If you're using an emulator then try a real device - emulators can have various quirks that prevent straightforward code from working as expected.

My updated SaveHTML project is attached.

Martin.
 

Attachments

adrianstanescu85

Active Member
Licensed User
Martin,

I'm using a real device, an LG 5 II (model E455). I switched to your latest example above, here is the complete log:

** Activity (main) Create, isFirst = true **
** Activity (main) Resume **
XenForo.SquareThumbs: %o in http://www.basic4ppc.com/android/forum/js/xenforo/xenforo.js?_v=28d42049 (Line: 358)
XenForo.init() %dms. jQuery %s/%s in http://www.basic4ppc.com/android/forum/js/xenforo/xenforo.js?_v=28d42049 (Line: 191)
Invalid App Id: Must be a number or numeric string representing the application id. in http://connect.facebook.net/en_US/all.js (Line: 56)
FB.getLoginStatus() called before calling FB.init(). in http://connect.facebook.net/en_US/all.js (Line: 56)
FB.getLoginStatus() called before calling FB.init(). in http://connect.facebook.net/en_US/all.js (Line: 56)
PageFinished: B4A.CallSub('ProcessHTML', false, document.documentElement.outerHTML)

I can't really make a sense of this, however the log is missing. I'm thinking whether this is a problem I may be having with B4A itself?

Adrian
 

warwound

Expert
Licensed User
Which version of Basic4Android are you using? The latest?
Have you got any other devices you can try the code on?

Martin.
 

warwound

Expert
Licensed User
I'd rarely recommend anyone to use an emulator but if you have time you could try an emulator.
Help establish whether this is a problem with your device or the older version of B4A.

An idea - take my previously posted SaveHTML-20131114.zip project and add a line ito execute some different javascript:

B4X:
Sub WebView1_PageFinished (Url As String)
   '   Now that the web page has loaded we can get the page content as a String
   
   '   see the documentation http://www.basic4ppc.com/forum/additional-libraries-classes-official-updates/12453-webviewextras.html#post70053 for details of the second parameter callUIThread
   
   '    a simple test
   WebViewExtras1.executeJavascript(WebView1, "alert('Hello World')")
   
   Dim Javascript As String
   Javascript="B4A.CallSub('ProcessHTML', false, document.documentElement.outerHTML)"
   
   Log("PageFinished: "&Javascript)
   WebViewExtras1.executeJavascript(WebView1, Javascript)
End Sub
Do you see an 'alert' message box?

Martin.
 

adrianstanescu85

Active Member
Licensed User
Martin,

Yes, the alert message appears, the log stops at that point and carries on after I click the OK button. Still no code though... It looks like the ProcessHTML event never fires.

Adrian
 

warwound

Expert
Licensed User
That's strange!

We've established that the WebView javascript is enabled that the JavascriptInterface is working.

I'll be thinking...

Martin.
 

warwound

Expert
Licensed User
Aha!

Try the non obfuscated Release mode - i bet the event name is getting obfuscated in the compilation process.
There should be a text file in your project's Objects folder that lists all text that has been obfuscated - i bet the text "ProcessHTML" is listed there.

Martin.
 

adrianstanescu85

Active Member
Licensed User
Touche! That was the problem! And yes, the Obfuscator txt file contained the event! How do I keep the obfuscator on and exclude such text from that?
 

warwound

Expert
Licensed User
The obfuscator will not obfuscate a Sub name if the Sub name contains an underscore.
So this works for me:

B4X:
Sub Process_Globals
End Sub

Sub Globals
   Dim WebViewExtras1 As WebViewExtras
   Dim WebView1 As WebView
End Sub

Sub Activity_Create(FirstTime As Boolean)
   Activity.LoadLayout("layoutMain")
   
   '   add the B4A javascript interface to the WebView
   WebViewExtras1.addJavascriptInterface(WebView1, "B4A")
   
   '   adding a WebChromeClient will log all browser console message to the android log
   '   so any webpage or javascript errors will be logged
   WebViewExtras1.addWebChromeClient(WebView1, "")
   
   '   now load a web page
   WebView1.LoadUrl("http://www.basic4ppc.com/android/forum/threads/getting-the-source-code-of-a-webpage-with-webview-and-webviewextras.34418/#post-202076")
End Sub

Sub Activity_Resume
End Sub

Sub Activity_Pause (UserClosed As Boolean)
End Sub

Sub WebView1_PageFinished (Url As String)
   '   Now that the web page has loaded we can get the page content as a String
   
   '   see the documentation http://www.basic4ppc.com/forum/additional-libraries-classes-official-updates/12453-webviewextras.html#post70053 for details of the second parameter callUIThread
   
   Dim Javascript As String
   Javascript="B4A.CallSub('Process_HTML', false, document.documentElement.outerHTML)"
   

   Log("PageFinished: "&Javascript)
   WebViewExtras1.executeJavascript(WebView1, Javascript)
End Sub

Sub Process_HTML(Html As String)
   '   This is the Sub that we'll get the web page to send it's HTML content to
   
   '   Log may truncate a large page so you'll not see all of the HTML in the log but the 'html' String should still contain all of the web page HTML
   
   Log("Process_HTML: "&Html)
End Sub
Martin.
 
Top