Android Question Can we get the webpage content (text only) from a Webview?

wonder

Expert
Licensed User
Longtime User
Is it possible to extract the text being displayed by a webpage loaded in webview?

If not, any ideas for such an algorithm?
(The equivalent of opening a webpage in a browser, Ctrl+A, Ctrl+C and Ctrl-V in Notepad.)
 

wonder

Expert
Licensed User
Longtime User
I used HttpUtils with Job.Getstring to get webpage content directly.
Does it grab the entire HTML code or only the rendered text? I'm only after the webpage output, meaning the human readable content itself.
 
Upvote 0

susu

Well-Known Member
Licensed User
Longtime User
Does it grab the entire HTML code or only the rendered text?
Only HTML code just like you open webpage with view-source. But you can save HTML into file then load it into WebView (however, the link to images, CSS... may be broken).
 
Upvote 0

sorex

Expert
Licensed User
Longtime User
I never use that special lib but maybe it allows you to inject javascript and receive javascript data.

then you could use data=document.body.textContent; and pull back in the data value.

notice that it might add unwanted stuff like javascript portions so you might need to filter that out.

if it is in a block with an id you could use data=document.getElementById('id').textContent; instead then you don't have the script filtering misery.
 
Upvote 0

moster67

Expert
Licensed User
Longtime User
Upvote 0

inakigarm

Well-Known Member
Licensed User
Longtime User
I think you can find JSoup library methods and properties very helpfull; you can access directly elements, tags, etc.. from the original html page
 
Upvote 0

sorex

Expert
Licensed User
Longtime User
I would just go for the okHTTP method aswell but it's not clear why he really want to grab it from the webview it must have some reason :)
 
Upvote 0

wonder

Expert
Licensed User
Longtime User
I'm ok with okHTTP... :) Webview was just the first thing that came to my mind. I'm want to experiment with web crawling and data mining. As a starting point, I'll try to download the entire Wikipedia... :D :D :D

No, but seriously, getting some content (text) from Wikipedia would be a great starting point. :)
 
Upvote 0
Top