Android Question Help reading html page

Michael Gasperi

Member
Licensed User
I'd like to pick some specific data out of a webpage that is normally intended to be just viewed with a browser. With Chrome I can load the page and "view source" to see that it should be about 200K worth of characters.

I tried using http2utils to download the url, but the job.getstring that returns is only about 5K long. Clearly not the whole webpage and unfortunately not far enough down for the part I'm trying to get.

Is it possible to get the whole html page as a string and how do you do it?
 

Michael Gasperi

Member
Licensed User
For some reason the string length is too short. I tried using www.amazon.com as the url and got over 400K so there must be something in the reply from the site I really want that truncates the string early.
 

mark35at

Well-Known Member
Licensed User
Tried to download the source code and got the whole thing, "i" shows 9186 lines but in the log I can only see the last 4588 lines. There are however a lot of empty lines in there. Here is my code:

B4X:
#Region Module Attributes
    #FullScreen: True
    #IncludeTitle: False
    #ApplicationLabel: URL Test
    #VersionCode: 1
    #VersionName: 1.0
    #SupportedOrientations: portrait
    #CanInstallToExternalStorage: True
#End Region

Sub Process_Globals
       
End Sub

Sub Globals
    Dim line As String
    Dim TextReader1 As TextReader
   
End Sub

Sub Activity_Create(FirstTime As Boolean)
    'If FirstTime Then
    'End If
   
    Dim MyJob As HttpJob
    MyJob.Initialize("MyJob", Me)
    MyJob.Download("http://www.amazon.com")
   
End Sub

Sub Activity_Resume
   
End Sub

'Event handler for Job Done
Sub JobDone(Job As HttpJob)
    Dim TextReader1 As TextReader
    Dim line As String
    TextReader1.Initialize(Job.GetInputStream)
    'Read first line from html page
       
    Dim i As Int
    For i=1 To 10000
        If line.IndexOf("</html>")>-1 Then    'Marker for end of page
            Exit
        End If
           
        line = TextReader1.ReadLine   
        Log(line)
    Next
   
    Log("Lines in string: " & i)   
    Job.Release
End Sub
Also used HttpJob.bas and HttpUtils2Service.bas from HttpUtils2 without any changes.

Maybe you could try dumping the Input stream to a file to see. Sorry I have no time to continue. Good luck.
 

Michael Gasperi

Member
Licensed User
Thanks for all your help. I figured out the problem. The website I was trying to grab the data from requires registration. Much like the B4A site, you can opt to "stay logged on" and you never see the log-in stuff again. By going into the site from HttpUtils2 and not Chrome, the site was bringing up a much smaller page that was asking for email and password NOT the big one full of data I was expecting. I think I can solve the problem with something like WebViewExtras rather than HttpUtils2.
 
Top