Android Question Special Characters

adriano.freitas

Active Member
I have a text box that may contain some special characters that cause problems when saved in the database and even in the queries themselves, even using UTF-8. As the content may be large text, I need to know the best way to encode/decode the string to keep the special characters without significantly compromising performance.

Can anyone help?
 

adriano.freitas

Active Member
Local SQLite db? There shouldn't be a problem with saving string with special characters. Post an example if you aren't able to get it working.
Local and remore db. Sometimes onde, sometimes other.
Code to be saved as text:
Drop Function If Exists   remover_acentos;

Delimiter |

Create Function fn_remover_acentos( Texto VARCHAR(10000) ) RETURNS VARCHAR(10000)

Begin
    Set @Texto = Texto;

    -- Acentos
    Set @ComAcentos       = 'ŠšŽžÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝŸÞàáâãäåæçèéêëìíîïñòóôõöøùúûüýÿþƒ';
    Set @SemAcentos       = 'SsZzAAAAAAACEEEEIIIINOOOOOOUUUUYYBaaaaaaaceeeeiiiinoooooouuuuyybf';
    Set @Quantidade       = Length(@ComAcentos);

    While @quantidade > 0 Do
          Set @Texto      = Replace(@Texto, Substring(@ComAcentos, @Quantidade, 1), Substring(@SemAcentos, @Quantidade, 1));
          Set @Quantidade = @Quantidade - 1;
    End While;

    -- Caracteres Especiais
    Set @Especial         = '!@#$%¨&*(^~)_±=§¹²³£¢¬"`´{*Erro*<,>.:;?/°ºª±*|\\';
    Set @Quantidade       = Length(@Especial);
    
    While @Quantidade > 0 Do
          Set @Texto      = Replace(@Texto, Substring(@Especial, @Quantidade, 1), '');
          SET @Quantidade = @Quantidade - 1;
    End While;

    Return @Texto;

End
|
Delimiter ;
As for how to record and read SQLite text, I don't use anything special... The application is a code manager to help developers, storing functions, code snippets and manipulating them with various resources. The problem occurred when I tried to paste the code below (code for MySQL) so that it would be stored. It is a code that has characters specific to my language and symbols and the problem occurred in them. Here's what I want to store:
 
Upvote 0

adriano.freitas

Active Member
The problem is in your MySQL encoding settings. Set it to: utf8mb4
Also set the connection character encoding.

It will work.

I don't know if I explained it correctly. Also the cell phone corrector generated an error in the texts based on my language...

I'm not using MySQL database, but SQLite. I need to store programming codes and queries from other databases in it for later consultation and manipulation. The problem is receiving a code like above in a textbox and trying to store this code as text in a SQLLite base.

Would it still be enough to change to UTF8mb4?

If so, could you tell me how to do this in SQLLite?
Thanks!!!
 
Upvote 0

zed

Active Member
Licensed User
Try with smart string literal
 
Upvote 0

emexes

Expert
Licensed User
the best way to encode/decode the string to keep the special characters

You could encode the "problem" characters (and ampersands) using the HTML &#n; method where n is the Unicode character number (in decimal).

This has the bonus feature of being valid HTML that a web browser or view will happily decode for you (when wrapped in <HTML></HTML> tags).

For example:
- ASCII tab character would become &#9;
- ampersand would become &#38;
- Euro currency symbol would become &#8364;

Problem characters might be anything outside ASCII/Unicode 32 to 126, but often characters like double-quotes (34) and commas (44) and redirection/tag delimiters (60 and 62) cause problems too.

It's easy enough to decode.

First check is: if the encoded string doesn't contain "&" then there is nothing to decode, the string is already plain ASCII.

Otherwise, everything before the first "&" is plain ASCII, and then from there to the subsequent ";" is the &#n; character code. Handle those, then repeat.
 
Upvote 0

aeric

Expert
Licensed User
Longtime User
Recently I encountered the user input this character , . It saved correctly in SQLite but failed to read into Sybase. Not sure it is the Sybase encoding setting or PHP side.
 
Upvote 0

emexes

Expert
Licensed User

Lol that looks like something that got encoded as UTF-8 and then something like Windows-1252.

Righto, for WIndows-1252:

i with two dots is 207
one-quarter is 188
Πis 140

UTF 207 188 140 decimal is binary 11001111 10111100 10001100

which is looking good so far because the high-bit prefixes are correct for a multibyte encoding

01111 111100 001100 binary = 65292 decimal

Unicode character 65292 is a full-width comma

https://www.codetable.net/decimal/65292
 
Last edited:
Upvote 0

aeric

Expert
Licensed User
Longtime User
Lol that looks like something that got encoded as UTF-8 and then something like Windows-1252.

Righto, for WIndows-1252:

i with two dots is 207
one-quarter is 188
Πis 140

UTF 207 188 140 decimal is binary 11001111 10111100 10001100

which is looking good so far because the high-bit prefixes are correct for a multibyte encoding

01111 111100 001100 binary = 65292 decimal

Unicode character 65292 is a full-wifth comma

Yes, it is a non common "comma with a space" character. I guess user tried to edit the field from a web application system with unknown language/keyboard that causing the invalid character.
 
Upvote 0
Top