Android Question Getting the source code of a webpage with WebView and WebViewExtras

Discussion in 'Android Questions' started by adrianstanescu85, Nov 12, 2013.

  1. adrianstanescu85

    adrianstanescu85 Active Member Licensed User

    Hello,

    I need to do a pretty simple thing, which is exact the source code of a webpage, i.e. google.com or whatever, so that I can parse it later on.

    I added a WebView to my app and then using the WebViewExtras lib I tried the following:

    Sub WebView1_PageFinished (Url As String)
    ' Now that the web page has loaded we can get the page content as a String
    Dim JS1 As String
    JS1 = "B4A.CallSub('ProcessHTML', true ,document.documentElement.outerHTML)"
    Log("PageFinished: " & JS1)
    MyWebViewExtras.executeJavascript(WebView1, JS1)
    End Sub

    Sub ProcessHTML(Html As String)
    ' This is the Sub that we'll get the web page to send it's HTML content to

    ' Log may truncate a large page so you'll not see all of the HTML in the log but the 'html' String should still contain all of the web page HTML

    Log("ProcessHTML: " & Html)
    End Sub

    So far, the ONLY thing I get in my log is:

    ** Activity (main) Pause, UserClosed = false **
    ** Activity (main) Create, isFirst = true **
    ** Activity (main) Resume **
    PageFinished: B4A.CallSub('ProcessHTML',true,document.documentElement.innerHTML)

    So... it stops at that, no source code... Do you have any suggestions?

    Thank you!
    Adrian
     
  2. warwound

    warwound Expert Licensed User

    You haven't added the JavascriptInterface perhaps?

    Martin.
     
  3. adrianstanescu85

    adrianstanescu85 Active Member Licensed User

    Martin,

    I added the WebViewExtras lib, as far as I know that lib replaces an old one... Do I need to add something different? I do have to say the compiler doesn't output any error...

    Adrian
     
  4. warwound

    warwound Expert Licensed User

  5. adrianstanescu85

    adrianstanescu85 Active Member Licensed User

    Martin,

    Thank you for the reply, that was the exact example I was using before, and still... the log doesn't show any source code. To be more precise I downloaded the example already built that I found there and ran it.. same thing. Then I added a Log after the execution of the javascript, that is logged... but the ProcessHTML sub never gets launched! I have an extra log there that never fires. Any suggestions?

    I'm using the 1.40 version of WebViewExtras which you posted at http://www.basic4ppc.com/android/forum/attachments/webviewextras_v1_40-zip.18329/ and I hope it's the right one. I added the contents to the libs (extra libs actually) folder where I put all the other libs for B4A as I use them.

    Is the version wrong or something doesn't work from somewhere else?

    Thank you!
    Adrian
     
  6. warwound

    warwound Expert Licensed User

    Hmmm....

    I just downloaded the 'SaveHTML' example from the above link and compiled it using WebViewExtras 1.40, it works as expected!
    I did add a WebChromeClient to the WebView, now any browser console messages (such as errors) will be output to the android log:

    Code:
    Sub Activity_Create(FirstTime As Boolean)
       
    Activity.LoadLayout("layoutMain")
       
       
    '   add the B4A javascript interface to the WebView
       WebViewExtras1.addJavascriptInterface(WebView1, "B4A")
       
       
    '   adding a WebChromeClient will log all browser console message to the android log
       '   so any webpage or javascript errors will be logged
       WebViewExtras1.addWebChromeClient(WebView1, "")
       
       
    '   now load a web page
       WebView1.LoadUrl("http://www.basic4ppc.com/android/forum/threads/getting-the-source-code-of-a-webpage-with-webview-and-webviewextras.34418/#post-202076")
    End Sub
    This is the log output:

    You can see a few console messages then ProcessHTML is called and there's the webpage HTML.
    (I only pasted some of the log as the forum didn't want all 4000+ characters).

    This is on an old HTC Desire S running a custom Jelly Bean 4.2.2 android, but there's nothing in the code that means it shouldn't work on any version of android.
    (Version 1.40+ of WebViewExtras is required in order for the JavascriptInterface to work on android versions 4.2+).
    What device are you trying to run this code on?
    If you're using an emulator then try a real device - emulators can have various quirks that prevent straightforward code from working as expected.

    My updated SaveHTML project is attached.

    Martin.
     

    Attached Files:

  7. adrianstanescu85

    adrianstanescu85 Active Member Licensed User

    Martin,

    I'm using a real device, an LG 5 II (model E455). I switched to your latest example above, here is the complete log:

    ** Activity (main) Create, isFirst = true **
    ** Activity (main) Resume **
    XenForo.SquareThumbs: %o in http://www.basic4ppc.com/android/forum/js/xenforo/xenforo.js?_v=28d42049 (Line: 358)
    XenForo.init() %dms. jQuery %s/%s in http://www.basic4ppc.com/android/forum/js/xenforo/xenforo.js?_v=28d42049 (Line: 191)
    Invalid App Id: Must be a number or numeric string representing the application id. in http://connect.facebook.net/en_US/all.js (Line: 56)
    FB.getLoginStatus() called before calling FB.init(). in http://connect.facebook.net/en_US/all.js (Line: 56)
    FB.getLoginStatus() called before calling FB.init(). in http://connect.facebook.net/en_US/all.js (Line: 56)
    PageFinished: B4A.CallSub('ProcessHTML', false, document.documentElement.outerHTML)

    I can't really make a sense of this, however the log is missing. I'm thinking whether this is a problem I may be having with B4A itself?

    Adrian
     
  8. warwound

    warwound Expert Licensed User

    Which version of Basic4Android are you using? The latest?
    Have you got any other devices you can try the code on?

    Martin.
     
  9. adrianstanescu85

    adrianstanescu85 Active Member Licensed User

    I'm using B4A 2.71. For the moment I only got this one device, only later on I may be able to get a Motorola ET1.
     
  10. warwound

    warwound Expert Licensed User

    I'd rarely recommend anyone to use an emulator but if you have time you could try an emulator.
    Help establish whether this is a problem with your device or the older version of B4A.

    An idea - take my previously posted SaveHTML-20131114.zip project and add a line ito execute some different javascript:

    Code:
    Sub WebView1_PageFinished (Url As String)
       
    '   Now that the web page has loaded we can get the page content as a String
       
       
    '   see the documentation http://www.basic4ppc.com/forum/additional-libraries-classes-official-updates/12453-webviewextras.html#post70053 for details of the second parameter callUIThread
       
       
    '    a simple test
       WebViewExtras1.executeJavascript(WebView1, "alert('Hello World')")
       
       
    Dim Javascript As String
       Javascript=
    "B4A.CallSub('ProcessHTML', false, document.documentElement.outerHTML)"
       
       
    Log("PageFinished: "&Javascript)
       WebViewExtras1.executeJavascript(WebView1, Javascript)
    End Sub
    Do you see an 'alert' message box?

    Martin.
     
  11. adrianstanescu85

    adrianstanescu85 Active Member Licensed User

    Martin,

    Yes, the alert message appears, the log stops at that point and carries on after I click the OK button. Still no code though... It looks like the ProcessHTML event never fires.

    Adrian
     
  12. warwound

    warwound Expert Licensed User

    That's strange!

    We've established that the WebView javascript is enabled that the JavascriptInterface is working.

    I'll be thinking...

    Martin.
     
  13. warwound

    warwound Expert Licensed User

    You're compiling in Release mode are you - not using the Obfuscated mode?

    Martin.
     
  14. adrianstanescu85

    adrianstanescu85 Active Member Licensed User

    Martin, I'm compiling in "Release (obfuscated)" mode.
     
  15. warwound

    warwound Expert Licensed User

    Aha!

    Try the non obfuscated Release mode - i bet the event name is getting obfuscated in the compilation process.
    There should be a text file in your project's Objects folder that lists all text that has been obfuscated - i bet the text "ProcessHTML" is listed there.

    Martin.
     
  16. adrianstanescu85

    adrianstanescu85 Active Member Licensed User

    Touche! That was the problem! And yes, the Obfuscator txt file contained the event! How do I keep the obfuscator on and exclude such text from that?
     
  17. warwound

    warwound Expert Licensed User

    The obfuscator will not obfuscate a Sub name if the Sub name contains an underscore.
    So this works for me:

    Code:
    Sub Process_Globals
    End Sub

    Sub Globals
       
    Dim WebViewExtras1 As WebViewExtras
       
    Dim WebView1 As WebView
    End Sub

    Sub Activity_Create(FirstTime As Boolean)
       
    Activity.LoadLayout("layoutMain")
       
       
    '   add the B4A javascript interface to the WebView
       WebViewExtras1.addJavascriptInterface(WebView1, "B4A")
       
       
    '   adding a WebChromeClient will log all browser console message to the android log
       '   so any webpage or javascript errors will be logged
       WebViewExtras1.addWebChromeClient(WebView1, "")
       
       
    '   now load a web page
       WebView1.LoadUrl("http://www.basic4ppc.com/android/forum/threads/getting-the-source-code-of-a-webpage-with-webview-and-webviewextras.34418/#post-202076")
    End Sub

    Sub Activity_Resume
    End Sub

    Sub Activity_Pause (UserClosed As Boolean)
    End Sub

    Sub WebView1_PageFinished (Url As String)
       
    '   Now that the web page has loaded we can get the page content as a String
       
       
    '   see the documentation http://www.basic4ppc.com/forum/additional-libraries-classes-official-updates/12453-webviewextras.html#post70053 for details of the second parameter callUIThread
       
       
    Dim Javascript As String
       Javascript=
    "B4A.CallSub('Process_HTML', false, document.documentElement.outerHTML)"
       

       
    Log("PageFinished: "&Javascript)
       WebViewExtras1.executeJavascript(WebView1, Javascript)
    End Sub

    Sub Process_HTML(Html As String)
       
    '   This is the Sub that we'll get the web page to send it's HTML content to
       
       
    '   Log may truncate a large page so you'll not see all of the HTML in the log but the 'html' String should still contain all of the web page HTML
       
       
    Log("Process_HTML: "&Html)
    End Sub
    Martin.
     
    eps and AndOrNot like this.
  18. adrianstanescu85

    adrianstanescu85 Active Member Licensed User

    Remedy acheived! Totally working now!

    Thank you, very much appreciated!!

    Adrian
     
Loading...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice