B4J Question jOkHttpUtils2 getstring stops download after 3 dots (spread operator)

gezueb

Active Member
Licensed User
Longtime User
Hi All, I try to read a local html website said utils and parse some data. However, the job.Getstring truncates everything after (including) the first occurrence of three dots (...).
Size of website is some 250'000 characters, the string is only 1100 chars.
B4X:
Sub getLocalVartaData As ResumableSub

    Dim job As HttpJob
    job.Initialize("",Me)
    job.Download("http://192.168.1.18/home.html")
    Wait For (job) JobDone(job As HttpJob)
    If job.Success Then
        Dim result As String
        result = job.GetString
        Log("String-Length: "&result.Length)
        Log(result)
    End If
    
    job.Release
End Sub
 

DonManfred

Expert
Licensed User
Longtime User
LOG does not log all data. it is limited to 4000 chars or so.
write the response to a file and you´ll see all.
 
Upvote 0

gezueb

Active Member
Licensed User
Longtime User
Thanks Don, I am aware of that. However, the returned string (not the log) should be unlimited and the string length should show everything that is listed e.g. in notepad.
 
Upvote 0

mcqueccu

Well-Known Member
Licensed User
Longtime User
Thanks Don, I am aware of that. However, the returned string (not the log) should be unlimited and the string length should show everything that is listed e.g. in notepad.
DonManfred is Right.

The return string is Unlimited. LOG will show ONLY 4000 characters. If you want to see all, even in notepad. Use file.Writestring, or output it in a textbox. You should see everything
 
Upvote 0

gezueb

Active Member
Licensed User
Longtime User
Of course you are both right. But that is not my problem. I am looking for a specific marker in the response. I see the marker in the browser and I see it when I copy the page into notepad. But the string returned by getstring does not contain it because the string (not the Log!) is far too short.
 
Upvote 0

aeric

Expert
Licensed User
Longtime User
Hi gezueb,

You mean a html markup? You can Log a specific section of the html content.

B4X:
Sub DownloadPage
    Dim job As HttpJob
    job.Initialize("", Me)
    job.Download("http://127.0.0.1/home.html")
    Wait For (job) JobDone(job As HttpJob)
    If job.Success Then
       Dim result As String = job.GetString
       Dim Header As String = result.SubString2(result.IndexOf("<h3>"), result.IndexOf("</h3>") + "</h3>".Length)
       Log(Header)
    End If
    job.Release
End Sub
 
Upvote 0

aminoacid

Active Member
Licensed User
Longtime User
Asci Code are 3 times 2E (Hexadecimal of course)

Hmm... then I wonder if the character after the three 0x2E is choking it. I assume that the html file does not contain the three dots, right? Maybe you can edit the html file right around the area where it is getting truncated and delete a few characters. Perhaps there is some non-graphic ASCII character embedded at that point that's causing it. I'm not sure about how B4J process strings but if something like a null 0x00 is embedded in the string it could truncate it.
 
Upvote 0

gezueb

Active Member
Licensed User
Longtime User
Thank you, aminoacid and all, I am closing now for tonight, but come back with results soon. Have a nice weekend!
 
Upvote 0

aminoacid

Active Member
Licensed User
Longtime User
Thank you, aminoacid and all, I am closing now for tonight, but come back with results soon. Have a nice weekend!

BTW, I tried your sub using "job.Download("http://google.com")" and I get

String-Length: 21441

So there has to be some character in your "home.html" file at the 1101 character position that's causing it. One way to verify that is to add some dummy text in the html file before that point and see if the String-Length increases when you run the sub.
 
Upvote 0

gezueb

Active Member
Licensed User
Longtime User
Sorry, I cannot change the original web page and thus the string. It's a server embedded in my home backup battery. The data usually is picked up periodically by a server from the supplier (Varta) from where I can look into the history. Now Varta is financially on the brink and I fear I will loose all my stored data. Thats why I try to simulate Varta's server (pickup data and store it in a database).
 
Upvote 0

gezueb

Active Member
Licensed User
Longtime User
I have found the source of the problem. The string contains characters which are encoded like this %20 or %C3 or %9C which are German Umlauts. Getstring goes astray after this. There is a getstring2 function in okHttpUtils2 which should deal with encoded chars, but I have not found an example of how to use it.
 
Last edited:
Upvote 0

aminoacid

Active Member
Licensed User
Longtime User
Sorry, I cannot change the original web page and thus the string. It's a server embedded in my home backup battery. The data usually is picked up periodically by a server from the supplier (Varta) from where I can look into the history. Now Varta is financially on the brink and I fear I will loose all my stored data. Thats why I try to simulate Varta's server (pickup data and store it in a database).

You can download a copy of the page using "curl" (or wget) and then do your investigation on it:

curl "http://192.168.1.18/home.html" --output home.html
 
Last edited:
Upvote 0

aminoacid

Active Member
Licensed User
Longtime User
I have found the source of the problem. The string contains characters which are encoded like this %20 or %C3 or %PC which are German Umlauts. Getstring goes astray after this. There is a getstring2 function in okHttpUtils2 which should deal with encoded chars, but I have not found an example of how to use it.

Yes .... it may not be a valid UTF8 string .... download a copy of the page using "curl" as mentioned in my previous post. Then serve the page locally to your sub while trying different encodings. You can search this forum for the different types of encodings available. In fact, the html file may tell you which encoding to use. For example: Look for "charset" or "content-type" tags in the file.

Log(job.Response.ContentType)
Log(job.Response.ContentEncoding)

Also try

result = job.GetString2("Windows-1252")

In fact, you may be able to figure it out using the above checks without having to download a local copy of the page first.
 
Last edited:
Upvote 0

gezueb

Active Member
Licensed User
Longtime User
curl truncates also to 1186 bytes. While Firefox and Edge show the full page, curl does not. Probably there are some irregularities within the html code, but it will take me days or weeks to find out with curl and/or HTML editor . I will come back to the thread if my search was successfull or if I give up. Thanks to all of you!
 
Upvote 0
Top