Android Question Problem deleting high unicode character with custom keyboard

RB Smissaert

Well-Known Member
Licensed User
Longtime User
With a custom keyboard I use the following code to delete (Del key). This is in a simple EditText control.

B4X:
Sub Delete(strOld As String, iIndexStart As Int, iIndexEnd As Int) As String
    
    Dim strPre As String
    Dim strSuf As String

    If iIndexStart = iIndexEnd Then
        If iIndexStart > 0 Then
            strPre = strOld.SubString2(0, iIndexStart - 1)
        Else
            Return strOld
        End If
    Else
        strPre = strOld.SubString2(0, iIndexStart)
    End If
    
    If iIndexStart < strOld.Length Then
        strSuf = strOld.SubString(iIndexEnd)
    End If
    
    Return strPre & strSuf
    
End Sub

It works all fine, except when there are high range (I think above 65536) unicode characters, eg 127754, showing a wave image.
In that case the above code doesn't clear the whole character and there is the well known solid question mark left. Doing a second
Del key press with clear this question mark, but I would like the above to recognize that 2 delete actions are needed to clear the character
but I can't figure out how to recognize this situation. The Asc function doesn't work as the char to use it on doesn't hold the full Unicode character.
Any suggestion how to tackle this problem?

RBS
 

RB Smissaert

Well-Known Member
Licensed User
Longtime User
I forgot to say that the standard keyboard handles this fine (as expected), so it deletes the whole character.

RBS
 
Upvote 0

Erel

B4X founder
Staff member
Licensed User
Longtime User
BTW, why write a custom keyboard? It will be very complicated if you want to support emojies and other complex characters.

You can't use Asc or the standard string methods with high codepoints. You need to encode the string using UTF32.

Small example:
B4X:
Sub AppStart (Args() As String)
    Log("Hello world!!!")
    Dim s As String = "𐌀𐌀𐌀"
    Log(s.Length) 'wrong (6)
    Dim cp() As Int = ToCodePoints(s)
    Log(cp.Length)
    For Each c As Int In cp
        Log(c)
    Next
End Sub

Private Sub ToCodePoints(s As String) As Int()
    Dim b() As Byte = s.GetBytes("UTF-32LE")
    Dim res(b.Length / 4) As Int
    For i = 0 To b.Length - 1 Step 4
        res(i / 4) = BytesToInt(b, i)
    Next
    Return res
End Sub


Private Sub BytesToInt (Bytes() As Byte, StartIndex As Int) As Int
    Dim cp As Int
    For i = 0 To 3
        cp = Bit.Or(cp, Bit.ShiftLeft(Bit.And(0xff, Bytes(i + StartIndex)), 8 * i))
    Next
    Return cp
End Sub
 
Upvote 0

RB Smissaert

Well-Known Member
Licensed User
Longtime User
Asc(...) should work for that
Yes, thanks, it does as the surrogates are in the range 55296 to 57448. It looks I don't have to differentiate between the low and high surrogate code.
I needed to add some other code to the Delete Sub to Return if a surrogate pair was deleted or not as that affects code that gets the selection start.

This seems to work fine now:

B4X:
Sub Delete(strOld As String, iIndexStart As Int, iIndexEnd As Int, arrIntCharsDeleted() As Int) As String
    
    Dim strPre As String
    Dim strSuf As String
    Dim iCode As Int

    Enums.bDelKey = True
    
    If iIndexStart = iIndexEnd Then
        If iIndexStart > 0 Then
            iCode = Asc(strOld.SubString(iIndexStart - 1))
            If iCode > 55295 And iCode < 57449 Then
                strPre = strOld.SubString2(0, iIndexStart - 2)
                arrIntCharsDeleted(0) = 2
            Else
                strPre = strOld.SubString2(0, iIndexStart - 1)
                arrIntCharsDeleted(0) = 1
            End If
        Else
            Return strOld
        End If
    Else
        strPre = strOld.SubString2(0, iIndexStart)
    End If
    
    If iIndexStart < strOld.Length Then
        strSuf = strOld.SubString(iIndexEnd)
    End If
    
    Return strPre & strSuf
    
End Sub

RBS
 
Upvote 0

RB Smissaert

Well-Known Member
Licensed User
Longtime User
BTW, why write a custom keyboard? It will be very complicated if you want to support emojies and other complex characters.

You can't use Asc or the standard string methods with high codepoints. You need to encode the string using UTF32.

Small example:
B4X:
Sub AppStart (Args() As String)
    Log("Hello world!!!")
    Dim s As String = "𐌀𐌀𐌀"
    Log(s.Length) 'wrong (6)
    Dim cp() As Int = ToCodePoints(s)
    Log(cp.Length)
    For Each c As Int In cp
        Log(c)
    Next
End Sub

Private Sub ToCodePoints(s As String) As Int()
    Dim b() As Byte = s.GetBytes("UTF-32LE")
    Dim res(b.Length / 4) As Int
    For i = 0 To b.Length - 1 Step 4
        res(i / 4) = BytesToInt(b, i)
    Next
    Return res
End Sub


Private Sub BytesToInt (Bytes() As Byte, StartIndex As Int) As Int
    Dim cp As Int
    For i = 0 To 3
        cp = Bit.Or(cp, Bit.ShiftLeft(Bit.And(0xff, Bytes(i + StartIndex)), 8 * i))
    Next
    Return cp
End Sub

Yes, it wasn't simple to make this custom keyboard, but it gives some serious benefits. In this app the only place where it could cause problems
is in a SQL editor, but it will be a rare thing to use complex characters in there, although it can be nice in SQL comments.
I have the custom keyboard as an option, so I can switch between the two no problem.

RBS
 
Upvote 0

agraham

Expert
Licensed User
Longtime User
You can't use Asc or the standard string methods with high codepoints. You need to encode the string using UTF32.
Sorry to contradict you but yes you can. You can identify surrogate characters using Asc() and deal with the next character appropriately. I personally think that is easier than messing with bytes.
 
Upvote 0

Erel

B4X founder
Staff member
Licensed User
Longtime User
Sorry to contradict you but yes you can. You can identify surrogate characters using Asc() and deal with the next character appropriately.
That's true. You can get the correct code point with some non-trivial code. From my experience it is much simpler to work with UTF32 where every codepoint is 4 bytes. If you are interested in text encoding then check the code of BCTextEngine (CreateBCTextCharsFromString). Text and emojies are quite complicated to handle properly.
 
Upvote 0
Top