Android Question Spotty results decoding quoted-printable html email

William Hunter

Active Member
Licensed User
Longtime User
I have been getting spotty results decoding quoted-printable html email with the forum’s recommended sub. It gives a good result in some cases. But in others, either the message displayed is malformed, or at worst will not display at all.

I am attaching two HTML files for an email recently received from B4A. This email will not display as decoded by the sub below.

Example A was decoded by this sub, is unedited and will not display. I have made notations at those lines that are causing the problem. All the required code is there, but these lines are fragmented.

Example B has been edited manually. I have reformatted the fragmented lines into a single line contained by its tags. I have again made notations at these lines. This HTML will now display correctly in a browser or WebView.

Is there something in this sub that could be changed, so that the fragmentation does not occur?
B4X:
Sub DecodeQuotePrintable(q As String) As String
    Dim bytes As List
    bytes.Initialize
    Dim i As Int
    Do While i < q.Length
        Dim c As String
        c = q.CharAt(i)
        If c = "_" Then
            bytes.AddAll(" ".GetBytes("utf8"))
        Else If c = "=" And i < q.Length - 1 Then
            Dim hex As String
            hex = q.CharAt(i + 1) & q.CharAt(i + 2)
            i = i + 2
            Try
                bytes.Add(Bit.ParseInt(hex, 16))
            Catch
                bytes.AddAll(hex.GetBytes("utf-8"))
            End Try
        Else
            bytes.AddAll(c.GetBytes("utf-8"))
        End If
        i = i + 1
    Loop
    Dim b(bytes.Size) As Byte
    For i = 0 To bytes.Size - 1
        b(i) = bytes.Get(i)
    Next
    Return BytesToString(b, 0, b.Length, "utf-8")
End Sub
I have attached a third HTML file, as Example C. This file was decoded by the quoted-printable sub below, followed by sub Strip. This comes very close to decoding my emails 100%. Unfortunately, every so often the Strip sub strips a needed character from an inline image, and only the place holder is displayed. This is not an elegant solution, but the sub below does not produce the line fragmentation created by the sub above. All emails are displayed, with only the occurrence of an occasional missing image. For illustration purpose, and to save space, I have only posted a portion of sub Strip.
B4X:
Sub DecodeQuotePrintable(q As String) As String
    Dim m As Matcher
    m = Regex.Matcher("=\?([^?]*)\?Q\?(.*)\?=$", q)
    If m.Find Then
        Dim charset As String
        Dim data As String
        charset = m.Group(1)
        data = m.Group(2)
        Dim bytes As List
        bytes.Initialize
        Dim i As Int
        Do While i < data.Length
            Dim c As String
            c = data.CharAt(i)
            If c = "_" Then
                bytes.AddAll(" ".GetBytes(charset))
            Else If c = "=" Then
                Dim hex As String
                hex = data.CharAt(i + 1) & data.CharAt(i + 2)
                i = i + 2
                bytes.Add(Bit.ParseInt(hex, 16))
            Else
                bytes.AddAll(c.GetBytes(charset))
            End If
            i = i + 1
        Loop
        Dim b(bytes.Size) As Byte
        For i = 0 To bytes.Size - 1
            b(i) = bytes.Get(i)
        Next
        Return BytesToString(b, 0, b.Length, charset)
    Else
        Return q
    End If
End Sub

Sub Strip(value As String) As String
    value = value.Replace("=20","")
    value = value.Replace("=21","!")
    value = value.Replace("=22",$"""$)
    value = value.Replace("=23","#")
    value = value.Replace("=24","$")
    value = value.Replace("=25","%")
    value = value.Replace("=26","&")
    value = value.Replace("=27","'")
    value = value.Replace("=28","(")
    Return value
End Sub
Extensions on HTML files renamed to txt in order to permit uploading.
 

Attachments

  • Example A.txt
    6 KB · Views: 446
  • Example B.txt
    6 KB · Views: 395
  • Example C.txt
    5.8 KB · Views: 498

William Hunter

Active Member
Licensed User
Longtime User
DecodeQuotePrintable was written to help developers decode a single line header.

Where is the html itself?
Hello Erel. The first Sub DecodeQuotePrintable above was obtained from post #7 at the following web link. The inference in that discussion is that it is to be used to decode quoted-printable email source, prior to extracting the html. I have done that with the result, in some cases, that certain lines are fragmented so that the html message will not display. In some cases the html extracted displays OK. At other times the decoding is incomplete. It’s spotty.

https://www.b4x.com/android/forum/threads/using-pop3-and-mailparser.66978/#post-424042

You ask, where is the html its self. If you are asking the see the original email source, I have attached it here. If I have misunderstood your question, please elaborate.

In the post accessed by the web link above, there is another suggested web link leading to the second Sub DecodeQuotePrintable I have been testing with. It is found in post # 8. In this instance the inference is that it is to be used to decode a single line header. This is the link.

https://www.b4x.com/android/forum/t...nicate-with-android-devices.11310/#post-87366

There is another suggested Sub DecodeQuotePrintable specifically for decoding headers, at the following web link. It can be found in post # 11.

https://www.b4x.com/android/forum/threads/decoding-email-headers.81209/#post-515165

This is every reference of decoding quoted-printable that I have found on the forum. None give me the result I need. I am looking for a means of decoding quoted-printable in an email source, prior to extracting the html. This has to decode the entire message, without fragmentation.

Best regards :)
 

Attachments

  • MsgSourceB4A.txt
    10.8 KB · Views: 477
Upvote 0

William Hunter

Active Member
Licensed User
Longtime User
If it is the end of line '=' signs that cause trouble then you can get rid of them with:
B4X:
s = s.Replace("=" & Chr(13) & Chr(10), "")
Thank you for your response Erel. No, it’s not the end of line ‘=’ signs not being removed by DecodeQuotePrintable that is the problem. They are successfully removed. I’ll describe the problem using one troublesome line of the msg source.

The line below is unprocessed source, having the end of line '=' sign.
B4X:
<body dir=3D"LTR" text=3D"#141414" bgcolor=3D"#F0F0F0" link=3D"#176093" ali=
nk=3D"#176093" vlink=3D"#176093" style=3D"padding: 10px">
The line below has been processed by DecodeQuotePrintable. It has been successfully decoded, except the line remains fragmented.
B4X:
<body dir="LTR" text="#141414" bgcolor="#F0F0F0" link="#176093" ali
nk="#176093" vlink="#176093" style="padding: 10px">
The line below has no fragmentation. This is what is needed in order for HTML to display properly.
B4X:
<body dir="LTR" text="#141414" bgcolor="#F0F0F0" link="#176093" alink="#176093" vlink="#176093" style="padding: 10px">
The problem is when a line in the msg source is fragmented, it remains fragmented after decoding. I would need some way of dealing with this in Sub DecodeQuotePrintable, but I can’t think of a way of doing so. I’m stymied on this.

Best regards
 
Upvote 0

OliverA

Expert
Licensed User
Longtime User
Are you decoding line by line or whole message at a time? If line by line, it could explain why the lines are still fragmented.
 
Upvote 0
Top