HTTPUtils problem

raistlin74

Member
Licensed User
Longtime User
Hi all,
I'm having some problems getting the HTML for this page :
http://it.wikipedia.org/wiki/Episodi_di_Big_Bang_Theory_(quinta_stagione)

Until some days ago, everything works without problem, but since yesterday (or few days ago) I receive in the HTML string a lot of strange chars.

I think is related to a unicode problem, but I cannot find a solution.

Can someone help?

The code I use is very simple :

HttpUtils.CallbackActivity = "view"
HttpUtils.CallbackJobDoneSub = "JobDone"
strSearch = "http://it.wikipedia.org/wiki/Episodi_di_Big_Bang_Theory_(quinta_stagione)"
HttpUtils.Download("Job3", strSearch)

And in the JobDone sub :

If HttpUtils.IsSuccess(strSearch) Then
strHTML = HttpUtils.GetString(strSearch)
strSplitted = Regex.Split(strSplit, strHTML)

Here if I look in the strHTML I find a lots of strange chars and not the HTML.

Thanks in advance,
Fabio
 

raistlin74

Member
Licensed User
Longtime User
Hi Erel,
thanks for the answer.

When I open the page using the Android browser, it say me that the page is Latin-1 and not UTF-8.

Also, if I open the page in a normal web browser, in upper right corner, the page shows me an alert.

maybe it is not UTF-8.

Thanks,
Fabio
 
Upvote 0

raistlin74

Member
Licensed User
Longtime User
Hi Erel,
thanks again.

Now the problem is become a mistery.

The page, I linked in first topic, from yesterday works with UTF-8.
But another page that was UTF-8 now is not recognized anymore -.-
 
Upvote 0

raistlin74

Member
Licensed User
Longtime User
Ok.

I've modified the HttpUtils procedure to get the ContentEncoding :

Sub hc_ResponseSuccess (Response As HttpResponse, TaskId As Int)
Response.GetAsynchronously("response", File.OpenOutput(TempFolder, TaskId, False), _
True, TaskId)
Try
strEncoding = Response.ContentEncoding
Catch
End Try
End Sub

And for the page that gives me error, it shows "gzip" as ContentEncodig lol.
I didn't know that wikipedia could send gzip format.

Now the problem is : how to handle gzip encoding?

thanks again,
Fabio
 
Upvote 0

raistlin74

Member
Licensed User
Longtime User
Great, solved with the previous add to HttpUtilsService and :


If HttpUtilsService.strEncoding = "gzip" Then
in = compStream.WrapInputStream(HttpUtils.GetInputStream(strSearch), "gzip")
End If

and everything works ;) ;)

Erel, maybe you can change the HttpUtils library to handle the encoding parameter, just to see which is the correct encoding of the page ;)
 
Upvote 0
Top