Android Code Snippet Unescape Unicode sequences for Spanish language

Erel created a sub for unescape or decode Unicode sequences to transform unicode characters, with the structure \uXXXX, into real characters.
Is good for processing texts that come from scraping web pages, for example.

But he created it to handle Hebrew characters, which can contain several Unicode sequences in a row.
I have corrected it for the Spanish language by adding the condition

And unicode.Length < 4

So that it only processes the 6 characters that make up a Unicode sequence, to avoid mistakes in case there are more characters after between 0 and 9 or between a and f

His code would look like this at the end:

Log(UnescapeUnicode("p\u00fablico.")) 'prints: público.    (Or pass a text with several words, in Unicode or not)

Sub UnescapeUnicode(s As String) As String
    Dim sb As StringBuilder
    Dim i As Int
    Do While i < s.Length
        Dim c As Char = s.CharAt(i)
        If c = "\" And i < s.Length - 1 And s.CharAt(i + 1) = "u" Then
            Dim unicode As StringBuilder
            i = i + 2
            Do While i < s.Length
                Dim cc As String = s.CharAt(i)
                Dim n As Int = Asc(cc.ToLowerCase)
                'Only up to 4 hexadecimal characters are accepted after \u
                If (n >= Asc("0") And n <= Asc("9")) Or (n >= Asc("a") And n <= Asc("f") And unicode.Length < 4) Then
'                    Log(unicode.ToString)
'                    Log(unicode.Length)
                    i = i - 1
                End If

                i = i + 1
            sb.Append(Chr(Bit.ParseInt(unicode.ToString, 16)))
        End If
        i = i + 1
    Return sb.ToString
End Sub
Last edited:


Active Member
Licensed User
I have created a second way to do the same:

Log(DecodeUnicode("canci\u00f3n p\u00fablica"))    'prints: canción pública

Sub DecodeUnicode(strOriginal As String) As String
    ' Pattern to find Unicode escape sequences like \uXXXX
    Dim m As Matcher
    m = Regex.Matcher("\\u[0-9a-fA-F]{4}", strOriginal)   'Double slash to escape '\' character in regular expression

    Dim resultBuilder As StringBuilder
    Dim currentIndex As Int   'To track the current position in the text
    Do While m.Find
        Dim match As String
        match = m.Match
'        LogColor(match, Colors.Green)
        If match <> "" Then
            ' Take actions with the matches found
'            Log("Match found in position: " & m.GetStart(0))        'Match Positions
            ' Adds unfound characters from the current position to the match to the StringBuilder
            resultBuilder.Append(strOriginal.SubString2(currentIndex, m.GetStart(0)))
            ' Add the substitute character to the StringBuilder
            Dim unicodeValue As Int
            unicodeValue = Bit.ParseInt(match.SubString(2), 16)  'Convert Unicode value to integer, omitting the leading '\'
            Dim charValue As String
            charValue = Chr(unicodeValue)  'Convert Unicode value to normal character
            ' Updates current position at the end of the match
            currentIndex = m.GetEnd(0)
        End If
    ' Add any characters not found after the last match
    If currentIndex < strOriginal.Length Then
    End If
    ' Now you have all the characters (matches and non-matches) in resultBuilder  
    Return resultBuilder.ToString
End Sub


Active Member
Licensed User
I see that they also work for other Romance languages.
But I don't know if it also works in other types of languages.

I would appreciate confirmations or comments.
Last edited: