B4J Question How to read file and data elements when file layout is provided with data file at the time of the run?

Kris.D

Member
Hi,
I'm looking for idea how to solve issue of reading file and data elements not knowing how this file looks prior to the run.
I will always get 2 files, data file and another file with the layout of the data.
The data file is fixed length file. For example if Last Name is 30 bytes it will always be 30 bytes. (last name + trailing spaces)
But I have no idea where to start with this?
If I know my data prior to compile I think I could create Custom Type object or something like that but what do I do in this case?

Thank you.

This is simple example of data and layout.
DATA
First NameLast NameMember DateMember NumberMember StatusMember Fee
JohnSmith
5/10/2019​
120501​
A
3.00​
AnnMiller
8/2/2020​
120502​
A
3.00​
RichJohnson
4/27/2020​
120507​
I
0.00​
MarkWilson
9/14/2021​
120504​
A
3.00​
JoeGarcia
5/5/2022​
120505​
A
4.00​
DebieJones
7/19/2022​
120509​
A
4.00​
LAYOUT
Field NameField TypeField Length
1​
First NameCharacter
35​
2​
Last NameCharacter
35​
3​
Member DateDate
4​
Member NumberDouble
9​
5​
Member StatusCharacter
1​
6​
Member FeeDecimal
3.2​
 

Andrew (Digitwell)

Well-Known Member
Licensed User
Longtime User
This actually look very similar to Erels formbuilder.
You should be able tyo use this as a basis to do what you would like to do here


Or you could use B4xTable for this.


Depending on how you want your GUI to appear
 
Upvote 0

emexes

Expert
Licensed User
Hi,
I'm looking for idea how to solve issue of reading file and data elements not knowing how this file looks prior to the run.
If I know my data prior to compile I think I could create Custom Type object or something like that but what do I do in this case?
I would have two lists or arrays that mirror your two files. Arrays have the advantage that the index is known, eg if you have an array of five descriptors for five fields, then the first field is index 0, the second field is index 1, ... the last field is index 4 (= the five fields - 1).

Or you could use a map, with the field number as the key, eg:

B4X:
Type FieldDescriptorType( _
    FNumber As Int, _
    FName As String, _
    FType As String, _
    FLength As String, _
    DecimalPlaces As Int _
)

and then read the LAYOUT file line by line (skipping the header line, or using it to confirm that the file has the expected columns), and fill in the descriptor fields accordingly, including implied lengths eg:

- date is 8 or 10 or 11 characters (yyyymmdd, dd/mm/yy, dd-mmm-yyyy ?)
- 3.2 decimal is 6 characters, or maybe 7 characters if the +/- sign is not included in the 3.

Once you've filled out the FieldDescriptor Map, then (i) you know how many fields are in each record (FieldDescriptor.Length) and (ii) you can calculate the fixed record length of the DATA file, by summing the field lengths:

B4X:
Dim RecordLength As Int = 0
For Each fd As FieldDescriptorType In FieldDescriptorMap    'doesn't matter which order they're summed in
    RecordLength = RecordLength + fd.FLength
Next

And then "all you have to do is" stroll through the DATA file reading RecordLength-byte blocks and splitting them up according to the FLengths, interpreting each field according to its FType and adding them to an array, and then adding the completed array to a List of records.

On a related note: old DBF files have a structure similar to having your LAYOUT + DATA files but as a single file, and can be read by most spreadsheet programs.
 
Upvote 0

Kris.D

Member
I would have two lists or arrays that mirror your two files. Arrays have the advantage that the index is known, eg if you have an array of five descriptors for five fields, then the first field is index 0, the second field is index 1, ... the last field is index 4 (= the five fields - 1).

Or you could use a map, with the field number as the key, eg:

B4X:
Type FieldDescriptorType( _
    FNumber As Int, _
    FName As String, _
    FType As String, _
    FLength As String, _
    DecimalPlaces As Int _
)

and then read the LAYOUT file line by line (skipping the header line, or using it to confirm that the file has the expected columns), and fill in the descriptor fields accordingly, including implied lengths eg:

- date is 8 or 10 or 11 characters (yyyymmdd, dd/mm/yy, dd-mmm-yyyy ?)
- 3.2 decimal is 6 characters, or maybe 7 characters if the +/- sign is not included in the 3.

Once you've filled out the FieldDescriptor Map, then (i) you know how many fields are in each record (FieldDescriptor.Length) and (ii) you can calculate the fixed record length of the DATA file, by summing the field lengths:

B4X:
Dim RecordLength As Int = 0
For Each fd As FieldDescriptorType In FieldDescriptorMap    'doesn't matter which order they're summed in
    RecordLength = RecordLength + fd.FLength
Next

And then "all you have to do is" stroll through the DATA file reading RecordLength-byte blocks and splitting them up according to the FLengths, interpreting each field according to its FType and adding them to an array, and then adding the completed array to a List of records.

On a related note: old DBF files have a structure similar to having your LAYOUT + DATA files but as a single file, and can be read by most spreadsheet programs.

Thank you, this is what I was looking for.
 
Upvote 0

emexes

Expert
Licensed User
stroll through the DATA file reading RecordLength-byte blocks and splitting them up according to the FLengths

For this bit, you do have to go through the fields in their position-in-record order eg Field 1, Field 2, Field 3, ...

unlike:

B4X:
For Each fd As FieldDescriptorType In FieldDescriptorMap    'doesn't matter which order they're summed in

So now I am inclined to say that all the groups-of-fields stuff should be arrays rather than maps or lists, with the index being the field number minus 1, or perhaps the arrays being one element longer than necessary and leaving element 0 unused, so that the array index and the field number are the same.

Probably best to still read the field descriptors into a Map, because you don't know the number of fields beforehand. Once you've read all the field descriptors and know how many there are, then just copy them into an array eg:

B4X:
Dim NumFields As Int = FieldDescriptorMap.Length
Dim FieldInfo(NumFields) as FieldDescriptorType
For FieldNumber = 1 to NumFields
    FieldInfo(FieldNumber - 1) = FieldDescriptorMap.Get(FieldNumber)
Next

hmm or perhaps you could do the following, which doesn't rely on a Map key and so you could just use a List instead:

B4X:
Dim NumFields As Int = FieldDescriptorMap.Length
Dim FieldInfo(NumFields) as FieldDescriptorType
For Each fd As FieldDescriptorType In FieldDescriptorMap    'doesn't matter which order they're copied in
    FieldInfo(fd.FNumber - 1) = fd    'array index = field number - 1
Next
 
Upvote 0

emexes

Expert
Licensed User
Something like this will split a fixed-length record up into fields:

lol I started writing just the RecordToFields function, but one thing led to another and before I knew what was happening... :

B4X:
Sub Process_Globals
    
Type FieldDescriptorType( _
    FNumber As Int, _
    FName As String, _
    FType As String, _
    FLength As String, _
    DecimalPlaces As Int _
)
    
End Sub


Sub RecordToFields(RecordString As String, FieldInfo() As FieldDescriptorType) As String()

    Dim NumFields As Int =  FieldInfo.Length
  
    Dim FieldStrings(NumFields) As String
    
    Dim StartPos As Int = 0
    For FieldIndex = 0 To NumFields - 1
        Dim EndPos As Int = StartPos + FieldInfo(FieldIndex).FLength
        If EndPos > RecordString.Length Then    'ideally would have check RecordString length is correct before trying to split into fields
            EndPos = RecordString.Length
        End If
        FieldStrings(FieldIndex) = RecordString.SubString2(StartPos, EndPos)    '.Trim
        StartPos = EndPos
    Next
    
    Return FieldStrings
    
End Sub


Sub StringArrayToSV(S() As String, Delimiter As String, Separator As String) As String
    
    Dim sb As StringBuilder
    sb.Initialize
    
    For I = 0 To S.Length - 1
        If I <> 0 Then
            sb.Append(Separator)
        End If
        sb.Append(Delimiter & S(I) & Delimiter)
    Next
    
    Return sb.ToString
    
End Sub


Sub StringArrayToCSV(S() As String) As String
    
    Return StringArrayToSV(S, """", ", ")
    
End Sub


Sub AppStart (Args() As String)
    
    Log("Hello world!!!")
    
    'test sample data:
    
    Dim FieldInfo(3) As FieldDescriptorType
    
    FieldInfo(0).FLength = 5
    FieldInfo(1).FLength = 10
    FieldInfo(2).FLength = 20
    
    'three different ways of doing the same thing:
    
    Dim F() As String = RecordToFields("abcdefghijklmnopqrstuvwxyz123456789", FieldInfo)
    Log(StringArrayToCSV(F))
    
    Dim RS As String = "[F1=][FIELD 2=][FIELD THREE=======]junk"
    Dim F() As String = RecordToFields(RS, FieldInfo)
    Dim CSV As String = StringArrayToCSV(F)
    Log(CSV)
    
    Dim RS As String = "FRED FLINTSTONEBEDROCK             junk"
    Log(StringArrayToCSV( RecordToFields(RS, FieldInfo) ))
    
End Sub

Log output:
Waiting for debugger to connect...
Program started.
Hello world!!!
"abcde", "fghijklmno", "pqrstuvwxyz123456789"
"[F1=]", "[FIELD 2=]", "[FIELD THREE=======]"
"FRED ", "FLINTSTONE", "BEDROCK             "
 
Last edited:
Upvote 0

Kris.D

Member
Something like this will split a fixed-length record up into fields:

lol I started writing just the RecordToFields function, but one thing led to another and before I knew what was happening... :

B4X:
Sub Process_Globals
   
Type FieldDescriptorType( _
    FNumber As Int, _
    FName As String, _
    FType As String, _
    FLength As String, _
    DecimalPlaces As Int _
)
   
End Sub


Sub RecordToFields(RecordString As String, FieldInfo() As FieldDescriptorType) As String()

    Dim NumFields As Int =  FieldInfo.Length
 
    Dim FieldStrings(NumFields) As String
   
    Dim StartPos As Int = 0
    For FieldIndex = 0 To NumFields - 1
        Dim EndPos As Int = StartPos + FieldInfo(FieldIndex).FLength
        If EndPos > RecordString.Length Then    'ideally would have check RecordString length is correct before trying to split into fields
            EndPos = RecordString.Length
        End If
        FieldStrings(FieldIndex) = RecordString.SubString2(StartPos, EndPos)    '.Trim
        StartPos = EndPos
    Next
   
    Return FieldStrings
   
End Sub


Sub StringArrayToSV(S() As String, Delimiter As String, Separator As String) As String
   
    Dim sb As StringBuilder
    sb.Initialize
   
    For I = 0 To S.Length - 1
        If I <> 0 Then
            sb.Append(Separator)
        End If
        sb.Append(Delimiter & S(I) & Delimiter)
    Next
   
    Return sb.ToString
   
End Sub


Sub StringArrayToCSV(S() As String) As String
   
    Return StringArrayToSV(S, """", ", ")
   
End Sub


Sub AppStart (Args() As String)
   
    Log("Hello world!!!")
   
    'test sample data:
   
    Dim FieldInfo(3) As FieldDescriptorType
   
    FieldInfo(0).FLength = 5
    FieldInfo(1).FLength = 10
    FieldInfo(2).FLength = 20
   
    'three different was of doing the same thing:
   
    Dim F() As String = RecordToFields("abcdefghijklmnopqrstuvwxyz123456789", FieldInfo)
    Log(StringArrayToCSV(F))
   
    Dim RS As String = "[F1=][FIELD 2=][FIELD THREE=======]junk"
    Dim F() As String = RecordToFields(RS, FieldInfo)
    Dim CSV As String = StringArrayToCSV(F)
    Log(CSV)
   
    Dim RS As String = "FRED FLINTSTONEBEDROCK             junk"
    Log(StringArrayToCSV( RecordToFields(RS, FieldInfo) ))
   
End Sub

Log output:
Waiting for debugger to connect...
Program started.
Hello world!!!
"abcde", "fghijklmno", "pqrstuvwxyz123456789"
"[F1=]", "[FIELD 2=]", "[FIELD THREE=======]"
"FRED ", "FLINTSTONE", "BEDROCK             "

Emexes, first you give me idea which got me very excited because it trigger my brain cells. Now you give me solution that is clever and cleaner what I would put together. Thank you.
 
Upvote 0
Top