Speed improvements for reading binary file?

corwin42

Expert
Licensed User
Longtime User
Hi!

I'm just wondering about the speed in reading a binary file.

I read a binary file with the following code (br is a binaryfile object):
B4X:
 FileOpen(c1, pFileName, cRandom)
 br.New1(c1, True)
 br.Position = 0

 br.ReadString ' Skip header
 fileVersion = br.ReadInt32 ' Read fileversion
 br.Offset(32) ' Skip not needed data
 
 Do While br.Position < br.Length
  br.ReadInt32 'entryType
      
  latitude = br.ReadDouble 'latitude
  longitude = br.ReadDouble 'longitude

  ' Skip rest of data block
  If fileVersion >= 230 Then
   br.Offset(72)
  Else If fileVersion >= 229
   br.Offset(56)
  Else
   br.Offset(40)
  End If
 Loop

The code reads a proprietary fileformat with geo coordinates (and some other data) in it. I have removed the stuff that stores the data in a database but the resulting part is quite slow too. The source looks quite minimal to me but I think it is very slow to read the data. For example reading a 800kb file with nearly 11000 koordinate pairs took one minute (with storing part of the koordinates in a database, the file reading part takes 25 seconds). If I convert the data to a GPX file the resulting GPX file is nearly 1.9MB in size. But loading it with the XMLReader and storing the same amount of koordinates in a database only takes 42 seconds. Since the XML library has to do much more checking etc. I have thought the direct load of the binary file would be much faster than reading a much larger and more complex GPX file.

Is this because the binaryfile has to be opened in random access mode? Is there a possibility to speed up the sequential reading of a binary file? Is there a complete other way to read the file?

Hints are very welcome.
Markus
 

corwin42

Expert
Licensed User
Longtime User
After optimizing the reading loop with using double array variables for counting and recompiling a complicate mathfunction for distance calculation with Andrews MathRecompiler I managed to speed up the loading of the file amazingly (before optimization: 32 secods, after: 15 seconds).

I extracted the reading of the binary file to a small test program and after using double array variables in this test program too the loading of a 480kB file just took about 4 seconds on my ASUS A696. I think that's OK.

So in my application I have 4 seconds for reading of the file and 11 seconds for processing the data.

If you are interested, here is my complete Reading function now:

B4X:
Sub Globals
 Dim minMaxLatLon(4) As Double
 Dim latLon(2) As Double
 Dim oldLatLon(2) As Double
 Dim numPts(2) As Double
 Dim distpoints(2) As Double
End Sub

Public Sub ReadTrack(pFileName)
 distPoints(0) = Settings.prefs.OptimizeGPXMinDistance
 distPoints(1)=0
 oldLatLon(0)=0
 oldLatLon(1)=0

 numPts(0) = 0
 numPts(1) = 0
 DataSource.numWayPts = 0
 minMaxLatLon(0) = 99999999
 minMaxLatLon(1) = -99999999
 minMaxLatLon(2) = 99999999
 minMaxLatLon(3) = -99999999

 WaitCursor(True)
 DoEvents
   
 FileOpen(c1, pFileName, cRandom)
 br.New1(c1, True)
 br.Position = 0
 Utils.InitProgress("Main.fMain", "Loading Training", 0, Round(br.Length/1024))

 br.ReadString
 fileVersion = br.ReadInt32
 br.Offset(32)
 
 DataSource.sqlcon.BeginTransaction
 Do While br.Position < br.Length
  br.ReadInt32 'entryType
      
  latLon(0) = br.ReadDouble 'latitude
  latLon(1) = br.ReadDouble 'longitude

  If fileVersion >= 230 Then
   br.Offset(72)
  Else If fileVersion >= 229
   br.Offset(56)
  Else
   br.Offset(40)
  End If

  If Settings.prefs.OptimizeGPX = 1 Then
   If oldLatLon(0) <> 0 Then
    distPoints(1) = GPXUtil.Distance_AsDouble(oldLatLon(0), oldLatLon(1), latLon(0), latLon(1))
   Else
    numPts(1) = numPts(1) + DataSource.AddLatLon(latLon(0), latLon(1), "trkpt")
    oldLatLon(0)=latLon(0)
    oldLatLon(1)=latLon(1)
    SetMinMax
   End If
      
   If distPoints(1) > distPoints(0) Then    
    numPts(1) = numPts(1) + DataSource.AddLatLon(latLon(0), latLon(1), "trkpt")
    oldLatLon(0)=latLon(0)
    oldLatLon(1)=latLon(1)
    SetMinMax
   End If
  Else
   numPts(1) = numPts(1) + DataSource.AddLatLon(latLon(0), latLon(1), "trkpt")
   SetMinMax
  End If
      
  numPts(0) = numPts(0) + 1
    
  If numPts(0) Mod 100 = 0 Then
   Utils.SetProgress(Round(br.Position/1024), "Koords: " & numPts(0))
   Main.hw.KeepAlive
  End If
 Loop 
 DataSource.sqlcon.EndTransaction

 DataSource.minLat = minMaxLatLon(0)
 DataSource.maxLat = minMaxLatLon(1)
 DataSource.minLon = minMaxLatLon(2)
 DataSource.maxLon = minMaxLatLon(3)
 DataSource.numTrkPts = numPts(0)
 DataSource.numUnique = numPts(1)

 WaitCursor(False)

 FileClose(c1)
 Utils.ClearProgress

 Return True
End Sub

Sub SetMinMax
 If minMaxLatLon(0) > latLon(0) Then
  minMaxLatLon(0) = latLon(0)
 Else If minMaxLatLon(1) < latLon(0) Then
  minMaxLatLon(1) = latLon(0)
 End If
 If minMaxLatLon(2) > latLon(1) Then
  minMaxLatLon(2) = latLon(1)
 Else If minMaxLatLon(3) < latLon(1) Then
  minMaxLatLon(3) = latLon(1)
 End If
End Sub

If you find something that I can do better please inform me. I still need to have a look at the DataSource.AddLatLon() function which adds the koordinates to a SQL table. Perhaps there is again some potential for optimizing.

I have attached my test program also so if you have some ideas of improving the speed while reading ... every second counts in my program.

Thanks,
Markus
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
Using agraham's ByteConverter library I'm reading the data in larger chunks and then convert it to doubles. I've also declared the inner loops variables as doubles.
On my device it took about 3 seconds to read the data with your file. After this change it takes about 0.2-0.1 seconds.
Code is attached.
B4X:
Sub Globals
 'Declare the global variables here.

 Dim numTrkPts(1) As Double
    Dim latLon(2) As double
    Dim buffer(0) As byte
    Dim chunk(1) As double, pos(1) As double, count(1) As double
End Sub

Sub App_Start
 Label1.Text="""
 Label2.Text=""
 Label3.Text=""
 dzhw.New1
 TimeFormat("HH:mm:ss")
 Form1.Show
 bytesConverter.New1
End Sub

Public Sub ReadTrack(pFileName)
 ErrorLabel(Error)

 numTrkPts(0) = 0

 WaitCursor(True)
 DoEvents
 
 startTicks=dzhw.GetTickCount
 FileOpen(c1, pFileName, cRandom)
 br.New1(c1, True)
 br.Position = 0

 br.ReadString
 fileVersion = br.ReadInt32
 br.Offset(32)
 chunk(0) = 4 + 8 + 8
 If fileVersion >= 230 Then
   chunk(0) = chunk(0) + 72
  Else If fileVersion >= 229
   chunk(0) = chunk(0) + 56
  Else
   chunk(0) = chunk(0) + 40
  End If
  Dim buffer(chunk(0) * 500) As byte
 Do While br.Position < br.Length
  count(0) = br.ReadBytes(buffer(), ArrayLen(buffer()))
  pos(0) = 0
      Do While pos(0) < count(0)
        latLon(0) = bytesConverter.DoubleFromBytes(buffer(), pos(0) + 4)
        latLon(1) = bytesConverter.DoubleFromBytes(buffer(), pos(0) + 12)
        pos(0) = pos(0) + chunk(0)
      numTrkPts(0) = numTrkPts(0) + 1

    Loop
    If pos(0) > count(0) Then 'this may happen if the last chunk was not fully read.
        br.Offset(count(0) - pos(0))
    End If
  

 Loop 

 WaitCursor(False)

 FileClose(c1)
 endTicks=dzhw.GetTickCount
 Label1.Text = "Koords: " & numTrkPts(0)
 Label2.Text= "Time: " & Format(Round((endTicks-startTicks)/1000, 1), "f1")
 Label3.Text= Round(latLon(0), 3) & "/" & Round(latLon(1), 3)
 Return True

 Error:
 
 WaitCursor(False)
 Label1.Text="Error Reading file"
 Return False
End Sub


Sub Button1_Click
 Label1.Text="Reading file..."
 Label2.Text=""
 Label3.Text=""
 ReadTrack(AppPath & "\data.gpb")
End Sub
 

corwin42

Expert
Licensed User
Longtime User
Hi Erel,

thanks very much!

I took your method over into my program and the speed increase is amazing again. The 480kB testfile now gets loaded in 8 seconds in my program. Before any optimizations it took 32 seconds. The largest file I have with over 18000 coordinates now gets loaded in 26 seconds. Before it took nearly 2 minutes.

Another big thanks goes to agraham who creates such great libraries!

Greetings,
Markus
 
Top