HTTP Post fails if Umlauts sent and UTF-8 is used

TWELVE

Active Member
Licensed User
Hello,

i got another one:

The code below works fine, but as soon as "mystring" contains Umlauts, an error message complaining about length mismatch occurs.


Request.ContentLength = StrLength("mystring")
...
stream.New1(Request.GetStream,false)
...
stream.WriteBytes (stream.StringToBytes("mystring" ))

Error description:
Bytes to be written to the stream exceed the Content-Length size specified.



stream is a binary file object.
Request is a webrequest.


Accordingly to the Helpfile, the binaryfile.new1( , false or true) is set
to true for ASCII and to false for UTF-8:

ASCII - If false, strings will be encoded using UTF-8 (Unicode) format, otherwise strings will be encoded using ASCII format.


If i set this to true, no error message occurs ( because there's no mismatch between stream length and content-length anymore), but the Umlauts are coverted to something like "?".So ASCII does not help here.



regards

TWELVE
 

agraham

Expert
Licensed User
Longtime User
The code below works fine, but as soon as "mystring" contains Umlauts, an error message complaining about length mismatch occurs.
This is because the umlauts are converted to 2 byte characters in UTF8 so lengthening the string. Try converting to a buffer then using the buffer size like this -

B4X:
Sub Globals
      Dim buffer(0) As Byte
End Sub


Sub App_Start

  buffer() = stream.StringToBytes("mystring")
  Request.ContentLength = ArrayLen(buffer()) 
  ...
  stream.New1(Request.GetStream,false)
  ...
  stream.WriteBytes(buffer()) 

End Sub
 

TWELVE

Active Member
Licensed User
Can you post a simple text you are trying to send?

Sure...Umlauts are Ä,Ö and Ü ( lower case ä,ö,ü).

"Ich fahre nach Österreich"

kind regards

TWELVE
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
As Agraham wrote you should use the buffer size and not the string size.
You will need to add a Bitwise object:
B4X:
request.New1(...)
    request.Method = "POST"
    request.ContentType = "application/x-www-form-urlencoded"
    bitwise.New1
    s = "Ich fahre nach Österreich"
    buffer() = bitwise.StringToBytes(s,0,StrLength(s))
    request.ContentLength = ArrayLen(buffer())
    stream.New1(request.GetStream,true)
    stream.WriteBytes(buffer())
    response.New1
    response.Value = request.GetResponse
    textbox1.Text = response.GetString
 

agraham

Expert
Licensed User
Longtime User
Silly me, I used the stream before it was opened :sign0161: But out of interest do you actually need the Bitwise object? Would this work?
B4X:
    request.New1(...)
    request.Method = "POST"
    request.ContentType = "application/x-www-form-urlencoded"
    s = "Ich fahre nach Österreich"
    stream.New1(request.GetStream,false) ' UTF8 encoding
    buffer() = stream.StringToBytes(s) ' use the stream to convert
    request.ContentLength = ArrayLen(buffer())
    stream.WriteBytes(buffer())
    response.New1
    response.Value = request.GetResponse
    textbox1.Text = response.GetString
There are some subtleties here that I don't understand :confused: Why does Erel's example open the stream as ASCII? And does the Bitwise StringToBytes method return UTF8 formatted bytes/characters
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
does the Bitwise StringToBytes method return UTF8 formatted bytes/characters
Bitwise.New1 uses the default UTF8 encoding.
Bitwise.New2 allows you to choose other encodings.

Why does Erel's example open the stream as ASCII?
As we are using the stream just to write bytes and not strings it doesn't matter whether we open it as ASCII or UTF8. The encoding only matters when reading or writing strings.

request.ContentLength must be set before request.GetStream. Otherwise you will get an error.
 

TWELVE

Active Member
Licensed User
Hello,

i solved this issue now by using:

stream.New2(Request.GetStream,1252)

instead of using:

stream.New1(Request.GetStream,false|true)

After some research and trials i found the code page 1252 to be appropriate for the german umlauts.


This is because the umlauts are converted to 2 byte characters in UTF8 so lengthening the string. Try converting to a buffer then using the buffer size like this -

I do understand the difference between the count of characters and the count of bytes they allocate.I didn't know how to get that calculated properly and clean...thx to agraham and Erel for your code examples.

Usually Unicode or UTF-8 is they better choice for supporting different languages / characters.For the moment i will stay with the code page, since this is the quicker fix for me ( the webserver also need an additional module for UTF-8 support..)

@Erel: is it possible to implement something like "STRByteLength(string)" to have the bytes of the string counted instead of the characters..? Or maybe as additional parameter in STRLength(string, mode)...?


kind regards

TWELVE
 
Last edited:

Erel

B4X founder
Staff member
Licensed User
Longtime User
@Erel: is it possible to implement something like "STRByteLength(string)" to have the bytes of the string counted instead of the characters..?

The length depends on the encoding used.
As agraham and I wrote in previous posts you can measure it by first converting the string to bytes (using a specific encoder/code page).
 

agraham

Expert
Licensed User
Longtime User
Hmm! Thanks Erel, it is times like this when you realise how little you really know about some things - like character codings :confused:

@Erel: is it possible to implement something like "STRByteLength(string)" to have the bytes of the string counted instead of the characters..? Or maybe as additional parameter in STRLength(string, mode)...?
A case for the Door library Erel?
 

TWELVE

Active Member
Licensed User
Thanks again Gentlemen for your rapid response in that matter...i really appreciate...

kind regards

TWELVE
 
Top