Android Question Help reading html page

Discussion in 'Android Questions' started by Michael Gasperi, Mar 4, 2015.

  1. Michael Gasperi

    Michael Gasperi Member Licensed User

    I'd like to pick some specific data out of a webpage that is normally intended to be just viewed with a browser. With Chrome I can load the page and "view source" to see that it should be about 200K worth of characters.

    I tried using http2utils to download the url, but the job.getstring that returns is only about 5K long. Clearly not the whole webpage and unfortunately not far enough down for the part I'm trying to get.

    Is it possible to get the whole html page as a string and how do you do it?
     
  2. Erel

    Erel Administrator Staff Member Licensed User

    Job.GetString returns the whole string.

    The log message truncates the message after 4000 characters. Check this:
    Code:
    Dim s As String = Job.GetString
    Log(s.Length)
     
  3. Michael Gasperi

    Michael Gasperi Member Licensed User

    For some reason the string length is too short. I tried using www.amazon.com as the url and got over 400K so there must be something in the reply from the site I really want that truncates the string early.
     
  4. mark35at

    mark35at Well-Known Member Licensed User

    Tried to download the source code and got the whole thing, "i" shows 9186 lines but in the log I can only see the last 4588 lines. There are however a lot of empty lines in there. Here is my code:

    Code:
    #Region Module Attributes
        
    #FullScreen: True
        
    #IncludeTitle: False
        
    #ApplicationLabel: URL Test
        
    #VersionCode: 1
        
    #VersionName: 1.0
        
    #SupportedOrientations: portrait
        
    #CanInstallToExternalStorage: True
    #End Region

    Sub Process_Globals
           
    End Sub

    Sub Globals
        
    Dim line As String
        
    Dim TextReader1 As TextReader
       
    End Sub

    Sub Activity_Create(FirstTime As Boolean)
        
    'If FirstTime Then
        'End If
       
        
    Dim MyJob As HttpJob
        MyJob.Initialize(
    "MyJob", Me)
        MyJob.Download(
    "http://www.amazon.com")
       
    End Sub

    Sub Activity_Resume
       
    End Sub

    'Event handler for Job Done
    Sub JobDone(Job As HttpJob)
        
    Dim TextReader1 As TextReader
        
    Dim line As String
        TextReader1.Initialize(Job.GetInputStream)
        
    'Read first line from html page
           
        
    Dim i As Int
        
    For i=1 To 10000
            
    If line.IndexOf("</html>")>-1 Then    'Marker for end of page
                Exit
            
    End If
               
            line = TextReader1.ReadLine   
            
    Log(line)
        
    Next
       
        
    Log("Lines in string: " & i)   
        Job.Release
    End Sub
    Also used HttpJob.bas and HttpUtils2Service.bas from HttpUtils2 without any changes.

    Maybe you could try dumping the Input stream to a file to see. Sorry I have no time to continue. Good luck.
     
  5. Michael Gasperi

    Michael Gasperi Member Licensed User

    Thanks for all your help. I figured out the problem. The website I was trying to grab the data from requires registration. Much like the B4A site, you can opt to "stay logged on" and you never see the log-in stuff again. By going into the site from HttpUtils2 and not Chrome, the site was bringing up a much smaller page that was asking for email and password NOT the big one full of data I was expecting. I think I can solve the problem with something like WebViewExtras rather than HttpUtils2.
     
Loading...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice