Android Question safely reading text files that may not be UTF8?

Dave O

Well-Known Member
Licensed User
Longtime User
Is there a recommended way of detecting/reading non-UTF8 text files?

One of my apps lets users import CSV files. Recently a German user reported that the import wasn't working, and we think we've narrowed it to the file encoding.

Encoding: He's saving his CSV file in "normal" encoding for Windows in Germany, which is probably 1252: Windows Western European. But it's not importing on his Android device.

(He emailed that file to me, and it imports fine on my phone, but I'm wondering if that's because I downloaded it to my PC, put it on Google Drive, imported it from there using ContentChooser, and perhaps somewhere in there it was converted to UTF8?)

I suppose I could just tell users that the file must be UTF8 with a comma, regardless of their own locale settings, but that might be hard for non-technical users to deal with.

Is there a way that I can:
- detect/handle likely encodings that are not UTF8?
- detect/handle a delimiter that is not a comma? (e.g. semi-colons are common in Europe, apparently)

Thanks for any tips!
 

roumei

Active Member
Licensed User
I've had the same problem with GPX files and solved it by trying to convert the data with BytesToString and different encodings. Probably not the most elegant way but I didn't get any complaints from my users anymore.
If you're sure that the real data in the CSV file doesn't contain semi-colons you can simply check whether semi-colons are there and then use them as a delimiter. (For files from Germany, you might have to replace the commas with points before converting strings into numbers.)

B4X:
    Dim sAllText As String = ""
    Dim arrByte() As Byte  = File.ReadBytes(sDir, sFile)
    Dim bSuccess As Boolean = False

    Try
        sAllText = BytesToString(arrByte, 0, arrByte.Length, "UTF8")
        bSuccess = True
    Catch
        Log(LastException)
    End Try

    If bSuccess = False Then
        Try
            sAllText = BytesToString(arrByte, 0, arrByte.Length, "Windows-1252")
            bSuccess = True
        Catch
            Log(LastException)
        End Try
    End If

    If bSuccess = False Then
        Try
            sAllText = BytesToString(arrByte, 0, arrByte.Length, "UTF7")
            bSuccess = True
        Catch
            Log(LastException)
        End Try
    End If
        
    If bSuccess = False Then Return Null
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
because I downloaded it to my PC, put it on Google Drive, imported it from there using ContentChooser, and perhaps somewhere in there it was converted to UTF8?
I can´t believe copying a file will change its encoding.
Start with the file in your email. Open it with Notepad++; What Encoding does the file have?
 
Upvote 0
Top