Difference between FileGet and FileGetByte

BjornF

Active Member
Licensed User
Longtime User
I am trying to read a shortcut file to see what file it points to. If I use FileGet then (after parsing) I get the following:

C:\Documents and Settings\bjf\Desktop\Dyrevelferd, et vitenskapelig perspektiv (Bj?rn Forkman).ppt

If on the other hand I use FileGetByte I get the following:

C:\Documents and Settings\bjf\Desktop\Dyrevelferd, et vitenskapelig perspektiv (Björn Forkman).ppt

which is the correct filename. But FileGet is of course much faster, so I would like to use it (it is not possible to use FileReadToEnd etc., shortcut-files are very strange to work with :()

Any neat solution to this?

all the best / Björn
 

BjornF

Active Member
Licensed User
Longtime User
Sorry, here it is. (Although I don't think that is the problem - and possibly parsing is an overly ambitious word :))

B4X:
   FileOpen(c1,Ftxt,cRandom)
   txt=FileGet(c1,0,FileSize(Ftxt))
   StartPos=StrIndexOf(txt,"C:\",0)
   If startpos>0 Then
      txt=SubString(txt,StartPos,StrLength(txt)-startpos)
   Else
      txt="can't find the c:\"
   End If
   textbox1.Text=txt


("1000" is just used to be certain that I have read the whole file)

B4X:
   FileOpen(c1,Ftxt,cRandom)
   Do Until i=1000      
      i=i+1
      x=FileGetByte(c1,i)
      textbox2.Text=textbox2.Text&Chr(x)
   Loop


Björn
 

agraham

Expert
Licensed User
Longtime User
It looks like a character encoding problem and is probably by design.

FileGet is reading the string with ASCII encoding which only comprises the character Chr(0) to Chr(127) and so doesn't recognise the "ö" character as being in the ASCII character set and so replaces it with a question mark. FilePut also only writes ASCII encoding so at least the two are symmetrical.

Reading it a byte at a time preserves the "ö" because it reads each byte of the file as a number and adds it to the text as a character so all the characters from 0 to 255 will come through. This would probably fail to return the correct characters if the string was UTF8 encoded and contained any multi-byte characters.
 

BjornF

Active Member
Licensed User
Longtime User
Hmm, so if I understand you right there is no way around this then... :(

all the best / Björn
 

agraham

Expert
Licensed User
Longtime User
Hmm, so if I understand you right there is no way around this then... :(
You should be OK reading the string as bytes, the worst case is that the filename you get back wouldn't be valid. It might be worth trying to read the filename as bytes and test each byte for a value of zero. Zero is usually used to indicate a string terminator in native Windows strings. I wouldn't be bothered about speed in reading it a byte at a time. The difference won't matter for your appication.
 

BjornF

Active Member
Licensed User
Longtime User
Thank you for the answer Graham.

I do get valid filenames when reading them with filegetbyte, so no problem there. It does take longer - unfortunately the short cut files contain a lot of other information, and there are also a number of them (around 250 in my Recent Files folder), but it isn't critical in any way.

The idea of looking for a zero byte is good, I'll use that for the end of the filename (the file is always finished with four zero bytes in a row so that should give me the end of the file instead of the "1000" I used in the example).

Thank you for the help / Björn
 
Top