Android Question Extracting text source website

lock255

Well-Known Member
Licensed User
Longtime User
Hello everyone, I can not play a function that I use a lot in vb.net, basically allows you to extract text from a web page without a source of unique references:

B4X:
Dim HTML as String
Dim V1,V2 as Object
Dim V as String
HTML = WebBrowser1.DocumentText.ToString
                V1 = Split(HTML, Chr(34) & ">")
                V2 = Split(V1(56), "</a></h2>")
                V = V2(0)
How can I play the same operation with B4A?
 

DonManfred

Expert
Licensed User
Longtime User
How can I play the same operation with B4A?

If you really need the webview (cause you want to show it to the user) then this could be of help.

But if you just need the html and dont want to show the html to the user then, as erel already suggested, using httputils2 should be the better alternative.

See this example

B4X:
Sub Activity_Create(FirstTime As Boolean)
    'Do not forget to load the layout file created with the visual designer. For example:
    'Activity.LoadLayout("Layout1")
    Dim php As HttpJob
    php.Initialize("htmltest",Me)
    php.Download("http://www.google.com/")
End Sub

Sub JobDone(Job As HttpJob)
    ProgressDialogHide
    If Job.Success Then
        Dim res As String
        res = Job.GetString
        Log("JobName: "&Job.JobName)
        If Job.JobName = "htmltest" Then
            Log("HTML is: "&res)
        Else If Job.JobName = "Init" Then
          Log("")
        End If
    Else
        ToastMessageShow("Error: " & Job.ErrorMessage, True)
    End If
    Job.Release
End Sub
 

Attachments

  • gethtml.zip
    6.2 KB · Views: 242
Upvote 0

lock255

Well-Known Member
Licensed User
Longtime User
In fact I need to retrieve the exact text from the source code of a page we not only as a reference sentences
As in the first example I did:
B4X:
V2 = Split(V1(56), "</a></h2>")

indicates that the text to take part after the 56 th: </a></h2> present in the source.
I hope I was clear and I apologize for my bad English.
 
Upvote 0

Erel

B4X founder
Staff member
Licensed User
Longtime User
There are two parts for this problem. First you need to download the text. The code @DonManfred posted will help you with that.

The second part is to parse the string. You can use Regex.Split if you want to split the string. It is usually better to use jTidy library to convert the html to XML and then use an XML parser to parser it.
 
Upvote 0

lock255

Well-Known Member
Licensed User
Longtime User
Upvote 0
Top