Android Question Diacritic signs

jccordier

Member
Licensed User
Hello.
In a list of words, there are these ones (among a lot of others) :
ἀγρός, γεωργός, θεός, ἰατρός, καιρός, καρπός (it's ancient greak).
When I seek for "ός" in my app (with IndexOf, or EndsWith, for example), I can find only 3 of them !
If I use GetBytes("UTF8"), I can see that there are 2 different sets of bytes for the string "ός" in these words :
(50,65,52,127,49,126) and (127,31,67,71,49,126)
This explains why some words are not found.
But when I do a search on these same words in Notepad++ (the words are in a csv file), with the same string, it works and all the words are found.
So, my questions :
1. How is it possible to get two different sets of bytes for the same set of characters ?
2. How can Notepad find all these words despite this ?
3. How can I do in my app ?
Thank you for your help.
 

jccordier

Member
Licensed User
Found an explanation (I think).
The string "ός", that I'm searching in the words, is typed in the B4A editor. If I copy it and I use it for searching in Notepad, it doesn't work !
So it's why I got 2 different sets of byte as I said.
And now, a new question :
How can I have the same code for the characters in Notepad (where I edit my csv file) and in B4A where I edit my program ?
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
I guess the file in notepad is not UTF8 encoded.
Can you upload such a textfile?
 
Upvote 0

jccordier

Member
Licensed User
Thanks for your answers.
I found a workaround. Silly but it works.
I copy the text of my csv file from Notepad++ (hi Erel ! I thought it was the only "Notepad") to the B4A editor, I save it. Then I load it in Notepad++, I erase the first lines, I save it again and it works !
I understand that it is an encoding problem, but I couldn't find the right encoding in Notepad++. When I copy from B4A to Notepad++, it's always UTF8, but there is something different. Maybe an ISO or OEM problem ? I don't know.
Now my file is ok, it's the more important for me. Later I'll look for a better solution.
 
Upvote 0

jccordier

Member
Licensed User
As I said, it was UTF8 in both cases.
But don't worry. I changed my mind.
Instead of having an array of strings defined in my code, I put it in a file.
It's a very small file (30 bytes !), but so i'm sure all my strings are encoded in the same way.
Thanks for your help.
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
It's a very small file (30 bytes !), but so i'm sure all my strings are encoded in the same way.
I suggest to re-check with notepad++
I´m pretty sure the file you are using in notepad is not UTF8. Maybe UTF with BOM

Upload such a text file please.
 
Upvote 0
Top