Has anyone a Gedcom Parser?

enonod

Well-Known Member
Licensed User
Longtime User
I need a Gedcom 5.5 parser (genealogy). It is basically a text file. Has anybody produced one and is able to make it available please? Failing that does anybody know where to find the basics of writing one please?

There is a 'C' library available that does the job, but I suspect I will get into hot water trying to use it with B4PPC
 
Last edited:

enonod

Well-Known Member
Licensed User
Longtime User

enonod

Well-Known Member
Licensed User
Longtime User
Typical file size may be 100K but it could go to 200K typically.
The data comes from either hand written or from family tree programs.

Most programs will import and export this format because it is the world standard for transportation of genealogical data.

In my instance, (and I could see others maybe following suit) it is to use B4PPC to write utilities and also a full family tree program.
The reason, I am sure most genealogy people would agree, is that whichever program one uses there is always something that is not big enough, clear enough, not displayed at the same time as other vitals, too many buttons to press to get to see all the info but never see what you want when you want it... and so on ad infinitum.

If the underlying Gedcom file from whatever program one is using is exported, it could then be fed into a program or utility of one's own. In order to import it a Parser is needed, 'preferably' that will feedback results as it runs (say, every record) because to produce another file would achieve nothing.
So the parser needs to have routines that can be called (dll?) from B4PPC and/or receive intermediate results as one's own program is running.
The running program (mine) would be responsible for linking the data and displaying it in various windows, forms etc.

That is how I see getting the program that is perfect for me and when I change my mind I can reprogram.
 

enonod

Well-Known Member
Licensed User
Longtime User
Here are two, one is small, 20k and the other pretty large 292k, this should allow for seeing common factors. The file forms a hierarchy with a number at the beginning of each item, '0' denotes a new record.
 

enonod

Well-Known Member
Licensed User
Longtime User
Thank you for your interest Erel
 

derez

Expert
Licensed User
Longtime User
Well, I did the opposite of what you need - I made my home-made family tree program export the data to Gedcom file, so it can be imported by other tree programs.

That means I know something about Gedcom, and will be able to support, but I don't have the parser as you describe it.

I'll look at it tomorrow to remind myself.
 

derez

Expert
Licensed User
Longtime User
Here is a link to the ver. 5.5 of the Gedcom Standard.
http://www.math.clemson.edu/~simms/genealogy/ll/gedcom55.pdf

If there is an existing library that can be converted or adopted to basic4ppc - that would be great, but to build it from scratch is (to my humble opinion) - a waste of time. I would stick to a specific Family Tree Program's database definitions and create the routines that read the gedcom file line by line and allocate the data to the specific database cells (or discard it since there is no database cell to fit it)

The following code is exporting my complete database to a Gedcom file, if it may help - be my guest.
The Database is a structure of the following items:

B4X:
Dim Type(No,Surname,Name,Nick,Father,Mother,Spouse,Marriage,Sex,Fcod,Mcod,Scod,Gen,Pic,Birth,Death,Prev_N,Town,Country,Misc)DB(2500)

Each person has a unique ID no.
DB.Fcod, DB.Mcod and DB.Scod include the id no of the related Father, Mother, Spouse accordingly.
Gen is Generation (starting somewhere and growing from father to son), Pic is 1 if there is a picture, where the picture filename is "ID.jpg".
Sex is 1 for male, 2 for Female.
Birth, Marriage and Death are dates.
Town and Country are places in general, I do not have specific for birth place etc.
Prev_N is previous Family name - in case of married woman or other reasons.
Free text data is stored in files ID.txt, and the program checks if the file exist to enable reading it.

Attached you can find one persons family cell's, from the export file.

B4X:
#Region Gedcom
Sub GedExport_Click
WaitCursor(True)
FileOpen(c4,"Gedcom_Export_file_" & source & ".ged",cWrite)
write_header
export
FileClose(c4)
Msgbox("Export Completed to" & CRLF & "Gedcom_Export_file_" & source & ".ged")
WaitCursor(False)
End Sub

Sub write_header
FileWrite(c4,"0 HEAD")
FileWrite(c4,"1 SOUR David Erez Family Tree")
FileWrite(c4,"2 VERS 10")
FileWrite(c4,"2 NAME D Erez")
FileWrite(c4,"3 ADDR 18 AYA st Ramat Hasharon ISRAEL")
FileWrite(c4,"1 DEST")
FileWrite(c4,"1 Date " &  DateD & "." & DateM & "." & DateY)
FileWrite(c4,"1 SUBM @S0@")
FileWrite(c4,"1 FILE Family_Export_File_" & source & ".ged")
FileWrite(c4,"1 GEDC")
FileWrite(c4,"2 VERS") 
FileWrite(c4,"2 FORM LINEAGE_LINKED")
FileWrite(c4,"1 CHAR UTF-8")
FileWrite(c4,"0 @S0@ SUBM")
FileWrite(c4,"1 NAME D. Erez")
FileWrite(c4,"1 ADDR 18 AYA st Ramat Hasharon ISRAEL")
FileWrite(c4,"2 CONT")
End Sub

Sub export
For i = 2 To dbsize
  If db(i).Surname <> "" Then         'individual
   FileWrite(c4,"0 @I" & db(i).No & "@ INDI")   ' name
   FileWrite(c4,"1 NAME " & db(i).Name & " " & db(i).Nick & " /" & db(i).Surname & "/")
   FileWrite(c4,"2 GIVN " & db(i).Name)
   FileWrite(c4,"2 SURN " & db(i).Surname)
   If db(i).Nick <> "" Then FileWrite(c4,"2 NICK " & db(i).Nick)
   If db(i).Prev_N <> "" Then
      If db(i).Sex = 2 Then 
         FileWrite(c4,"2 _MARNM " & db(i).Prev_N)
      Else
         FileWrite(c4,"2 _AKA " & db(i).Prev_N)
      End If
   End If
                                 ' sex, birth and death, place   
   If db(i).Sex = 1 Then FileWrite(c4,"1 SEX M") Else FileWrite(c4,"1 SEX F")
   If db(i).Birth <> "" Then 
      FileWrite(c4,"1 BIRT")
      FileWrite(c4,"2 DATE " & db(i).Birth)
   End If
   If db(i).Death <> "" Then 
      FileWrite(c4,"1 DEAT")
      FileWrite(c4,"2 DATE " & db(i).Death)
   End If
                                 ' family connections - children of 
   If IsNumber(db(i).Fcod)  Then 
        FileWrite(c4,"1 FAMC @F" & db(i).Fcod & "@")
   Else
        If  IsNumber(db(i).Mcod) Then FileWrite(c4,"1 FAMC @F" & db(i).Mcod & "@")
   End If
                                 ' family connections - spouses  
   If  IsNumber(db(i).Scod)  Then 
       If db(i).Sex = 2 Then 
          FileWrite(c4,"1 FAMS @F" & db(i).scod & "@")
      Else
          FileWrite(c4,"1 FAMS @F" & db(i).no & "@")
      End If
   End If 
                                 ' town and country
   If db(i).Country <> "" Then 
        st = db(i).Country
       FileWrite(c4,"1 ADDR")
       If db(i).Town <> "" Then st = st & " " & db(i).Town
       FileWrite(c4,"2 CONT " & st)
       FileWrite(c4,"2 _NAME n/a")
       If db(i).Country <> "" Then FileWrite(c4,"2 CTRY " & db(i).Country)
       If db(i).Town <> "" Then FileWrite(c4,"2 CITY " & db(i).Town )
    End If
                                 ' photo attachment   
   If db(i).Pic = 1 Then 
      FileWrite(c4,"1 OBJE")
      FileWrite(c4,"2 FORM JPG")
      FileWrite(c4,"2 FILE " & AppPath & "\JPG\" & db(i).No & ".jpg")  
      FileWrite(c4,"2 _SCBK Y")
      FileWrite(c4,"2 _PRIM Y")
      FileWrite(c4,"2 _TYPE PHOTO")
   End If
                                 ' text file attachment
   fname = AppPath & "\text\" & db(i).no & letter & ".txt"
   If  FileExist(fname) Then   
      FileWrite(c4,"1 NOTE @NI" & db(i).No & "@ ")
      FileWrite(c4,"0 @NI" & db(i).No & "@ NOTE")
      FileOpen(c6,fname,cRead)
      st = FileRead(c6)
      hdr = "1 CONC "
      Do While st <> EOF
         Do While StrLength(st) > 60
            data = SubString(st,0,60)
            FileWrite(c4,hdr & data)
            hdr = "1 CONC "
            st = SubString(st,60,StrLength(st)-60)
         Loop
         FileWrite(c4,hdr & st)
         st = FileRead(c6)
         hdr = "1 CONT "
      Loop
      FileClose(c6)
   End If
                                 ' Family creation
   If db(i).Sex = 1  AND (IsNumber(db(i).Scod) OR children(i) <> "") Then 
      FileWrite(c4,"0 @F" & db(i).no & "@ FAM") 
      FileWrite(c4,"1 HUSB @I"  & db(i).no & "@") 
      
      If IsNumber(db(i).Scod) Then
         FileWrite(c4,"1 WIFE @I" & db(i).Scod & "@") 
         If db(i).Marriage <> "" Then 
            FileWrite(c4,"1 MARR")
            FileWrite(c4,"2 DATE " & db(i).Marriage) 
         End If
      End If
   
      If children(i) <> "" Then
         record() = StrSplit(children(i),",")
         For j = 0 To ArrayLen(record()) -1
            FileWrite(c4,"1 CHIL @I" & record(j) & "@") 
         Next
      End If
   End If
                                 ' if only mother with no father
   If db(i).Sex = 2 AND IsNumber(db(i).Scod)= False  AND children(i) <> "" Then 
      FileWrite(c4,"0 @F" & db(i).no & "@ FAM") 
      FileWrite(c4,"1 WIFE @I"  & db(i).no & "@")
      record() = StrSplit(children(i),",")
      For j = 0 To ArrayLen(record()) -1
         FileWrite(c4,"1 CHIL @I" & record(j) & "@") 
      Next
   End If
  End If
Next
FileWrite(c4,"0 TRLR")
End Sub
#End Region
 
Last edited:

enonod

Well-Known Member
Licensed User
Longtime User
Thank you very much for your response and code derez, I will look closely at it.

The issue of using an existing program is that none of them account for everything. You used the word 'discard'. That is exactly what I am trying to avoid. If program A is used and then exported to program B which has a feature not in A (and of course A has a feature not in B) then some data is discarded in the transfer. When transferring back to A again yet more is discarded.
Writing a program incorporates 'everything' required rather than 'tailoring' or 'making do with' or 'I'll use this for that and hope it exports'.

They all say they are compatible with 5.5 and they are but that does not mean that they incorporate all of it. Also 5.5 incorporates the ability for 'custom' items, even the user can customise. Not all programs customise the same information and this is what will get discarded.

A bespoke program would permit me to customise to my hearts content, learn a lot, and get enjoyment on the way. WOW, to have a family tree program built exactly as I require. Hopefully a library for parsing might permit custom items to be added to it but if not then that part of the parsing could be done outside the library, hence NO discarding.

Parts of the tree may come from other Gedcom files from other people or the internet or other programs. One of the most difficult things is to correct what has been discarded or ignored or worse, corrupted.

As for writing from scratch, I think a database such as SQLite is superb for the job and is used incidentally by 'Rootsmagic' (one of the best), most others use a folder full of different database files (Dbase usually) and indexes in a relational manner (often 20/25 files). SQlite, one file.

This is not to discard what you say, I will look with interest at your code.
Thanks
 

derez

Expert
Licensed User
Longtime User
Like I said before:
If there is an existing library that can be converted or adopted to basic4ppc - that would be great, but to build it from scratch...

I looked at the Gedcom parser library - its huge. Unless EREL or AGRAHAM or anybody else who masters the adaptation process can envelope it for b4ppc - I don't know how it can be done.

At the end of the standard there are charts. Even the Gedcom writers didn't define "the structure" but they show "commonly used structures".

Anyway, if this library materializes - I promise to use it and add an import part to my program.:icon_clap:
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
but to build it from scratch is (to my humble opinion) - a waste of time
:BangHead:

I've attached a general Gedcom parser module which parses Gedcom files and stores the data as a tree.
It also collects all the keys in a Hashtable making it possible to quickly jump between records as required.
The data is stored in an array of structures names Records.
'Records' is declared as:
B4X:
Public Type (Tag, Value, FirstSon, NextSibling) Records(0)
FirstSon and NextSibling will equal "" if no such record exists. Otherwise their value will be the record's number.

GED.sbp is an example which uses the GEDParser module to parse a file and then show the contents of each key in a TreeView.
Choose a key from the ComboBox and the tree will be populated with the data from this key.
Please feel free to ask any question on this code as it might be a little bit tricky.

Note that the parser only builds the tree (with the keys). It doesn't give any special meaning to different tags or values.
 

Attachments

  • GED.zip
    7.5 KB · Views: 278

derez

Expert
Licensed User
Longtime User
Erel
You are amazing.

I am still trying to understand the machine you created at no time.

I think that to reach the original need there are two more steps:

1. Add the "dictionary", translating the mnemonics of the standard to full names of the meaning of the data item.
2. Enable the jump from inside a record to the refered record (like in the tree of a person you can see his sibling, you should be able to select the record reference and jump to this reference tree).

The hard work is to build the dictionary, unless it exist in code already.

Edit: see the file yomtov above, it should all appear in one tree of a single person. it is txt because it is not a complete ged file, but the programs reads it allthesame.
 
Last edited:

enonod

Well-Known Member
Licensed User
Longtime User
So quickly Erel, that is truly amazing. I shall try this out with relish, I just need a bit of time. Oh and thanks for the effort and time.
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
2. Enable the jump from inside a record to the refered record (like in the tree of a person you can see his sibling, you should be able to select the record reference and jump to this reference tree).
Version 0.6 is attached with this feature and also a 'Back' button which can navigate to previously visited keys.
This version also fixes a bug related to the reusage of nodes objects.
The example file can be downloaded from my previous post.

About the dictionary, I think that the application that uses the parser should add the deeper knowledge about the semantics.
 

Attachments

  • GED.sbp
    2.8 KB · Views: 244
  • GEDParser.bas
    2.4 KB · Views: 259

derez

Expert
Licensed User
Longtime User
Great !

I added openfile button and opendialog filtered to *.GED ; *.txt to enable file selection, and start with the first item (to know that loading is finished).
The parser can assist very much the analysis of tree files. Thank you :)
B4X:
Sub App_Start
form1.Show
Node1.New1
ParentNode.New1
Tree.New1("form1", ComboBox1.Left, ComboBox1.Height + ComboBox1.Top + 5, form1.Width - 20, 180)
End Sub

Sub Openfile_Click
opendialog1.Show
GEDParser.ParseFile(opendialog1.File)
combobox1.Clear
For i = 0 To GEDParser.htKeys.Count-1 'Add all keys to the ComboBox.
   ComboBox1.Add(GEDParser.htKeys.GetKey(i))
Next
ShowKeyTree(ComboBox1.Item(0))
End Sub
 

enonod

Well-Known Member
Licensed User
Longtime User
Thank you again for this excellent addition Erel. I agree regarding the dictionary, it may not always be necessary and may also differ in requirement from person to person.

@derez. Thanks for posting your additional code which helps to illustrate. Very useful. I may find a couple of hours today to play.
 
Last edited:

derez

Expert
Licensed User
Longtime User
Erel

I tried to add this module to my Family program.
Since the data for one record in my program is included in several records of the parser, it makes it very complicated.

Can you please build the trees so that all records with the same number appear in one tree ( e.g. @I123@, @F123@, @O123@, @S123@ etc.) ?
In other words - all the lines starting with zero and the same number in the key should be the main nodes in the tree of that key person.

Thanks in advance.
 
Top