Difference between FileGet and FileGetByte

BjornF · Oct 28, 2008

I am trying to read a shortcut file to see what file it points to. If I use FileGet then (after parsing) I get the following:

C:\Documents and Settings\bjf\Desktop\Dyrevelferd, et vitenskapelig perspektiv (Bj?rn Forkman).ppt

If on the other hand I use FileGetByte I get the following:

C:\Documents and Settings\bjf\Desktop\Dyrevelferd, et vitenskapelig perspektiv (Björn Forkman).ppt

which is the correct filename. But FileGet is of course much faster, so I would like to use it (it is not possible to use FileReadToEnd etc., shortcut-files are very strange to work with )

Any neat solution to this?

all the best / Björn

agraham · Oct 28, 2008

BjornF said:
Any neat solution to this?

As you haven't shown us the parsing code that we can sees what leads to these results then no! -

BjornF · Oct 28, 2008

Sorry, here it is. (Although I don't think that is the problem - and possibly parsing is an overly ambitious word )

B4X:

   FileOpen(c1,Ftxt,cRandom)
   txt=FileGet(c1,0,FileSize(Ftxt))
   StartPos=StrIndexOf(txt,"C:\",0)
   If startpos>0 Then
      txt=SubString(txt,StartPos,StrLength(txt)-startpos)
   Else
      txt="can't find the c:\"
   End If
   textbox1.Text=txt

("1000" is just used to be certain that I have read the whole file)

B4X:

   FileOpen(c1,Ftxt,cRandom)
   Do Until i=1000      
      i=i+1
      x=FileGetByte(c1,i)
      textbox2.Text=textbox2.Text&Chr(x)
   Loop

Björn

agraham · Oct 28, 2008

It looks like a character encoding problem and is probably by design.

FileGet is reading the string with ASCII encoding which only comprises the character Chr(0) to Chr(127) and so doesn't recognise the "ö" character as being in the ASCII character set and so replaces it with a question mark. FilePut also only writes ASCII encoding so at least the two are symmetrical.

Reading it a byte at a time preserves the "ö" because it reads each byte of the file as a number and adds it to the text as a character so all the characters from 0 to 255 will come through. This would probably fail to return the correct characters if the string was UTF8 encoded and contained any multi-byte characters.

BjornF · Oct 28, 2008

Hmm, so if I understand you right there is no way around this then...

all the best / Björn

agraham · Oct 28, 2008

BjornF said:
Hmm, so if I understand you right there is no way around this then...

You should be OK reading the string as bytes, the worst case is that the filename you get back wouldn't be valid. It might be worth trying to read the filename as bytes and test each byte for a value of zero. Zero is usually used to indicate a string terminator in native Windows strings. I wouldn't be bothered about speed in reading it a byte at a time. The difference won't matter for your appication.

BjornF · Oct 29, 2008

Thank you for the answer Graham.

I do get valid filenames when reading them with filegetbyte, so no problem there. It does take longer - unfortunately the short cut files contain a lot of other information, and there are also a number of them (around 250 in my Recent Files folder), but it isn't critical in any way.

The idea of looking for a zero byte is good, I'll use that for the end of the filename (the file is always finished with four zero bytes in a row so that should give me the end of the file instead of the "1000" I used in the example).

Thank you for the help / Björn

Difference between FileGet and FileGetByte

BjornF

Active Member

agraham

Expert

BjornF

Active Member

agraham

Expert

BjornF

Active Member

agraham

Expert

BjornF

Active Member

Difference between FileGet and FileGetByte

BjornF

Active Member

agraham

Expert

BjornF

Active Member

agraham

Expert

BjornF

Active Member

agraham

Expert

BjornF

Active Member

Privacy & Transparency

Privacy & Transparency