HTTP Post fails if Umlauts sent and UTF-8 is used

TWELVE · Apr 28, 2008

Hello,

i got another one:

The code below works fine, but as soon as "mystring" contains Umlauts, an error message complaining about length mismatch occurs.

Request.ContentLength = StrLength("mystring")
...
stream.New1(Request.GetStream,false)
...
stream.WriteBytes (stream.StringToBytes("mystring" ))

Error description:
Bytes to be written to the stream exceed the Content-Length size specified.

stream is a binary file object.
Request is a webrequest.

Accordingly to the Helpfile, the binaryfile.new1( , false or true) is set
to true for ASCII and to false for UTF-8:

ASCII - If false, strings will be encoded using UTF-8 (Unicode) format, otherwise strings will be encoded using ASCII format.

If i set this to true, no error message occurs ( because there's no mismatch between stream length and content-length anymore), but the Umlauts are coverted to something like "?".So ASCII does not help here.

regards

TWELVE

Erel · Apr 28, 2008

Can you post a simple text you are trying to send?

agraham · Apr 28, 2008

TWELVE said:
The code below works fine, but as soon as "mystring" contains Umlauts, an error message complaining about length mismatch occurs.

This is because the umlauts are converted to 2 byte characters in UTF8 so lengthening the string. Try converting to a buffer then using the buffer size like this -

B4X:

Sub Globals
      Dim buffer(0) As Byte
End Sub


Sub App_Start

  buffer() = stream.StringToBytes("mystring")
  Request.ContentLength = ArrayLen(buffer()) 
  ...
  stream.New1(Request.GetStream,false)
  ...
  stream.WriteBytes(buffer()) 

End Sub

TWELVE · Apr 28, 2008

Can you post a simple text you are trying to send?

Sure...Umlauts are Ä,Ö and Ü ( lower case ä,ö,ü).

"Ich fahre nach Österreich"

kind regards

TWELVE

Erel · Apr 28, 2008

As Agraham wrote you should use the buffer size and not the string size.
You will need to add a Bitwise object:

B4X:

request.New1(...)
    request.Method = "POST"
    request.ContentType = "application/x-www-form-urlencoded"
    bitwise.New1
    s = "Ich fahre nach Österreich"
    buffer() = bitwise.StringToBytes(s,0,StrLength(s))
    request.ContentLength = ArrayLen(buffer())
    stream.New1(request.GetStream,true)
    stream.WriteBytes(buffer())
    response.New1
    response.Value = request.GetResponse
    textbox1.Text = response.GetString

agraham · Apr 28, 2008

Silly me, I used the stream before it was opened :sign0161: But out of interest do you actually need the Bitwise object? Would this work?

B4X:

    request.New1(...)
    request.Method = "POST"
    request.ContentType = "application/x-www-form-urlencoded"
    s = "Ich fahre nach Österreich"
    stream.New1(request.GetStream,false) ' UTF8 encoding
    buffer() = stream.StringToBytes(s) ' use the stream to convert
    request.ContentLength = ArrayLen(buffer())
    stream.WriteBytes(buffer())
    response.New1
    response.Value = request.GetResponse
    textbox1.Text = response.GetString

There are some subtleties here that I don't understand

Why does Erel's example open the stream as ASCII? And does the Bitwise StringToBytes method return UTF8 formatted bytes/characters

Erel · Apr 28, 2008

does the Bitwise StringToBytes method return UTF8 formatted bytes/characters

Bitwise.New1 uses the default UTF8 encoding.
Bitwise.New2 allows you to choose other encodings.

Why does Erel's example open the stream as ASCII?

As we are using the stream just to write bytes and not strings it doesn't matter whether we open it as ASCII or UTF8. The encoding only matters when reading or writing strings.

request.ContentLength must be set before request.GetStream. Otherwise you will get an error.

TWELVE · Apr 28, 2008

Hello,

i solved this issue now by using:

stream.New2(Request.GetStream,1252)

instead of using:

stream.New1(Request.GetStream,false|true)

After some research and trials i found the code page 1252 to be appropriate for the german umlauts.

This is because the umlauts are converted to 2 byte characters in UTF8 so lengthening the string. Try converting to a buffer then using the buffer size like this -

I do understand the difference between the count of characters and the count of bytes they allocate.I didn't know how to get that calculated properly and clean...thx to agraham and Erel for your code examples.

Usually Unicode or UTF-8 is they better choice for supporting different languages / characters.For the moment i will stay with the code page, since this is the quicker fix for me ( the webserver also need an additional module for UTF-8 support..)

@Erel: is it possible to implement something like "STRByteLength(string)" to have the bytes of the string counted instead of the characters..? Or maybe as additional parameter in STRLength(string, mode)...?

kind regards

TWELVE

Erel · Apr 28, 2008

@Erel: is it possible to implement something like "STRByteLength(string)" to have the bytes of the string counted instead of the characters..?

The length depends on the encoding used.
As agraham and I wrote in previous posts you can measure it by first converting the string to bytes (using a specific encoder/code page).

agraham · Apr 28, 2008

Hmm! Thanks Erel, it is times like this when you realise how little you really know about some things - like character codings

TWELVE said:
@Erel: is it possible to implement something like "STRByteLength(string)" to have the bytes of the string counted instead of the characters..? Or maybe as additional parameter in STRLength(string, mode)...?

A case for the Door library Erel?

Erel · Apr 28, 2008

A case for the Door library Erel?

I don't think that it will be simpler than using a Bitwise object:

B4X:

Sub App_Start
    bitwise.New2(RequiredCodePage)
    Msgbox(STRByteLength("Some string"))
End Sub

Sub STRByteLength(str)
    Return ArrayLen(bitwise.StringToBytes(str,0,StrLength(str)))
End Sub

TWELVE · Apr 29, 2008

Thanks again Gentlemen for your rapid response in that matter...i really appreciate...

kind regards

TWELVE

HTTP Post fails if Umlauts sent and UTF-8 is used

TWELVE

Active Member

Erel

B4X founder

agraham

Expert

TWELVE

Active Member

Erel

B4X founder

agraham

Expert

Erel

B4X founder

TWELVE

Active Member

Erel

B4X founder

agraham

Expert

Erel

B4X founder

TWELVE

Active Member