Android Question Download html website accent characters are displaying like question mark caracter

Discussion in 'Android Questions' started by Efo74, Jul 4, 2016.

  1. Efo74

    Efo74 New Member Licensed User


    Can someone helpme ??

    I'am a newby in b4a. I made an app that download htmlpage from internet link . That downloaded page is filtered and I display only some parts. I use j.gestring or j.getring2("UTF8") to get the page, but If in the page compare accent carachers like à ù ò ì they are converted to ? question mark. Is there a way to fix this situation anche correct read this characters ? Than you
  2. Erel

    Erel Administrator Staff Member Licensed User

    GetString and GetString2("UTF8") are the same. They assume that the page is encoded in UTF8.

    Can you post a link to one of the pages?
  3. Efo74

    Efo74 New Member Licensed User

  4. DonManfred

    DonManfred Expert Licensed User

    The site is NOT using utf8

  5. Efo74

    Efo74 New Member Licensed User

    Thank you, you are right.
    I'm so sorry but I had not found the "charset" section :oops:
  6. Erel

    Erel Administrator Staff Member Licensed User

    You can use this code to parse the charset:
    Sub Process_Globals
    End Sub

    Sub Globals
    End Sub

    Sub Activity_Create(FirstTime As Boolean)
    Dim j As HttpJob
    "j", Me)
    End Sub

    Sub JobDone(j As HttpJob)
    If j.Success Then
    Dim m As Matcher = Regex.Matcher2("<meta [^>]+charset=([^""]+)"Regex.CASE_INSENSITIVE, j.GetString)
    Dim charset As String = "utf8"
    If m.find Then
           charset = m.Group(
    Log("Found charset: " & charset)
    End If
    End If
    End Sub
  7. Efo74

    Efo74 New Member Licensed User

    Thank you Erel your

    Regex.Matcher2 technique is very usefull :)
