Decode UTF-8

CapReed

Member
Licensed User
Longtime User
Hi!

Can you please tell me how I can convert the string "Gestión" which is in UTF-8 encoding in your right it would be "Gestión"?

I try this, but dont go ...

B4X:
Dim charset As String
        Dim data As String
        charset = "UTF-8"
        data = "as hdajksdh kasdh kashd sda" ' Here the string in UTF-8 
        Dim bytes As List
        bytes.Initialize
        Dim i As Int
        Do While i < data.Length
            Dim c As String
            c = data.CharAt(i)
            
                bytes.AddAll(c.GetBytes(charset))
            
         
            i = i + 1
        Loop
        Dim b(bytes.Size) As Byte
        For i = 0 To bytes.Size - 1
            b(i) = bytes.Get(i)
        Next
        Log(BytesToString(b, 0, b.Length, charset))
      BytesToString(b,0,b.Length,charset)

Thank you.
 

CapReed

Member
Licensed User
Longtime User
I retrieve the text/html emails sections, I remove the html tags and then compose the text to display in a label included in a panel that is inside a ScrollView. Not worth a webview I do not want to show email, what I want is to work with the text extracted and analyzed.

Thanks.
 
Upvote 0

CapReed

Member
Licensed User
Longtime User
Hi Erel,

I have this:

HTML:
Return-Path: <[email protected]>
X-Original-To: [email protected]
Received: from smtp2e40.ip-zone.com (smtp2e40.ip-zone.com [93.159.212.240])
   by vl540.dinaserver.com (Postfix) with ESMTP id 3A1E5FCDE1
   for <[email protected]>; Thu,  6 Sep 2012 09:43:06 +0200 (CEST)
Received: by smtp2e40.ip-zone.com id h91a1e16r3gj for <[email protected]>; Thu, 6 Sep 2012 09:43:10 +0200 (envelope-from <[email protected]>)
To: =?UTF-8?B?U0VSVklDSU9TIFRFTEVNQVRJQ09TIEVYVFJFTUXDkU9TIFkgQVNFU09SRVMgSU5GT1JNQVRJQ09TIFMuTC4=?= <[email protected]>
From: "=?UTF-8?B?SW5mb3Jtw6F0aWNhIE1lZ2FzdXI=?=" <[email protected]>
Reply-To: "=?UTF-8?B?SW5mb3Jtw6F0aWNhIE1lZ2FzdXI=?=" <[email protected]>
Date: Thu, 06 Sep 2012 09:43:10 +0200
Message-ID: <5048540e0d1af@megasur_ip-zone_com-6>
X-CcmId: 08045256404c426c59164c48595c043a0109585a0308035200050a020d04535153
List-Unsubscribe: <http://correo.sendnewsletter.es/ccm/unsubscribe/index/email/peppinto%40lolo.es>, <mailto:[email protected]>
Subject: Enermax, un paso adelante en Cajas Gaming
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="-------5048540e0d1fa"
Content-Transfer-Encoding: 7bit
X-DinaScanner-Information: DinaScanner. Filtro anti-Spam y anti-Virus
X-MailScanner-ID: 3A1E5FCDE1.86844
X-DinaScanner: Libre de Virus
X-DinaScanner-SpamCheck: no es spam, SpamAssassin (no almacenado,
   puntaje=-2.597, requerido 6, autolearn=not spam, BAYES_00 -2.60,
   HTML_IMAGE_RATIO_06 0.00, HTML_MESSAGE 0.00)
X-DinaScanner-From: [email protected]
X-Spam-Status: No

This is a message in multipart MIME format.  Your mail client should not
be displaying this. Consider upgrading your mail client to view this
message correctly.

---------5048540e0d1fa
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit




Si tiene problemas para visualizar este mensaje, pulse aquÃ







 

ENERMAX, UN PASO ADELANTE EN CAJAS GAMING

Â

This is a email example with charset="UTF-8".

I think that "Si tiene problemas para visualizar este mensaje, pulse aquÃ" is UTF-8 , but i dont sure. The correct is "Si tiene problemas para visualizar este mensaje, pulse aquí".

Maybe this topic is out of your advice. If so, you indicate me without problems.
Thank you.
 
Upvote 0

mc73

Well-Known Member
Licensed User
Longtime User
Here's what I observe: If loading this page using utf-8, both characters you mention are turning 'chinese'. If loaded with ISO-8859-1, I can see the two letters correctly displayed (at least so I think). Somehow I think that the email client is not truly using utf-8.
 
Upvote 0

CapReed

Member
Licensed User
Longtime User
Thank you very much for answering.

These characters are displayed in the Log window when I download the mail using the Net library 1.30. I downloaded all the emails, I apply the appropriate decoding either UTF-8 or ISO-8859-1 and I'm getting it all out texts with the correct encoding using the code above post. But these characters always go wrong ...

I'm making sure something silly.

Again, thank you very much. :confused:
 
Upvote 0

CapReed

Member
Licensed User
Longtime User
Perhaps, most likely, this is a sloppy solution, but so far I have not found anything better. You do not what you have in mind ...

Thanks for your interest.

HTML:
Sub ReplaceRaros(p_strText As String) As String

Dim strTemp As String
strTemp = p_strText

strTemp=strTemp.Replace("á","á")
strTemp=strTemp.Replace("é","é")
strTemp=strTemp.Replace("Ã*","í")
strTemp=strTemp.Replace("ó","ó")
strTemp=strTemp.Replace("ú","ú")
strTemp=strTemp.Replace("Ã","Á")
strTemp=strTemp.Replace("É","É")
strTemp=strTemp.Replace("Ã","Í")
strTemp=strTemp.Replace("Ã","Ó")
strTemp=strTemp.Replace("Ú","Ú")
strTemp=strTemp.Replace("ñ","ñ")
strTemp=strTemp.Replace("ç","ç")
strTemp=strTemp.Replace("Ñ","Ñ")
strTemp=strTemp.Replace("Ç","Ç")
strTemp=strTemp.Replace("©","©")
strTemp=strTemp.Replace("®","®")
strTemp=strTemp.Replace("â„¢","™")
strTemp=strTemp.Replace("Ø","Ø")
strTemp=strTemp.Replace("ª","ª")
strTemp=strTemp.Replace("ä","ä")
strTemp=strTemp.Replace("ë","ë")
strTemp=strTemp.Replace("ï","ï")
strTemp=strTemp.Replace("ö","ö")
strTemp=strTemp.Replace("ü","ü")
strTemp=strTemp.Replace("Ä","Ä")
strTemp=strTemp.Replace("Ë","Ë")
strTemp=strTemp.Replace("Ã","Ï")
strTemp=strTemp.Replace("Ö","Ö")
strTemp=strTemp.Replace("Ãœ","Ü")

Return strTemp
End Sub
 
Upvote 0
Top