Your recommendations please?

GomerFackworth

New member
Local time
Today, 01:46
Joined
Sep 18, 2006
Messages
5
First of all, I had some experience many years ago with setting up a db. Nothing since, so if I am in the wrong place, please feel free to say so.

I am in charge of a very large list of people, mostly dead, for a historical society in my town, and much of the data is on paper or in Excel files. In order to get a handle on how to organize it in a db, I have been entering data from paper to Excel for some time. Gradually, I have been able to combine different things, and sort out data to fit in different tables. A person has personal info LastName, First etc, birth/death data, cemetary records, historical artifacts catalog numbers, military records (I know the height & hair/eye color of civil war vets!) of battles fought, medals awarded, ranks, etc, etc. It is obvious to me that a db would be a much better place to keep this data than in Excel, but that would require a db to be designed. I understand enough about dbs to start the process, but I still have a bit more parsing to do. (You wouldn't believe how many different kinds of information can be shoved into the Remarks field!) I have spoken with some people who work with dbs for corporations, FoxPro etc. and they agree that a db would be better.

The problem that they see (all claim to know little or nothing about Access) is something to do with the Primary Key. Let me give you an example:
John Doe is a person who at present is assigned a personal ID. That ID goes everywhere with him because he is not the only John Doe. In fact I have 7 John Does, not all born in the same century, not necessarily related, hopefully with different birthdates, usually in a different grave, but sometimes the grave is unknown as with many veterans who died in prison, etc. and sometimes there is only an approximate birthdate or something like that, if I'm lucky. Also sometimes gravestones list children's names and it is unknown if the child is buried there or with their spouse. In the case of one woman, there are 4 listings as she was a daughter and married three times with a name on the stone of each husband. Of course, one can assume that she was not buried with the husbands who died earlier than the last one, BUT that is not always true, because sometimes women remarried other men for convenience but asked to be buried with their true love. See, it gets complicated sometimes and this was the reason I assigned each a PERSONAL ID. I had to find a way to reference each one to make sure that they were with the right persons.

For geneological purposes, it is important to have as many relationships as possible connected to each individual. One John Doe might have 4 children by two wives. Each wife has a husband and his ID as well as the children and their IDs listed, under the husband and son and daughter fields which include the husband, son, and daughter ID fields. My DB professionals say that that is a real problem. How do I set up a db so that I keep the correct husband with the correct wife without their respective IDs?

These IDs are a convenience to me as I enter data. They could be dispensed with once they have a Primary Key, AND are related to those other people. The trouble is, I haven't the foggiest idea of how I am going to get that accomplished when I transfer the data from Excel to Access.

And so I turn to you good people for some suggestions, even if it is to go to a different URL and bother them. Thanks in advance.
 
Hi GomerFackworth
Now I am by no means a DB specialist either, but I love to try and find solutions to problems like this.

First off - I find thaty getting the data from Excel to Access is a synch......
IF (BIG IF) .....it is
a. Formatted correctly. (Especially dates)
b. Data is split (normalised) one sheet for each table that you wish to create.
NOTE: In some instances, you will have to fill in the data for the join fields. (The thing that links the tables together - creating the relationships.) But if a field has been set as an AutoNumber in Access, then leave it blank in Excel.

As far as connecting everybody together....
I think that I would keep the unique IDs that you are assigning.
One possibility may be to create a "ReportsTo" Table

For example: (Numbers in brackets represent unique ID)
John Doe (1) marries Jane Joe (14) [Nee - Smith]
John dies and Jane marries Harold Stringentfink (23)
Jane has two children with Harold - Julie (28) and Mark (42)

The question here is - Does it matter who do the kids report to?

The "ReportsTo" Table looks something like this.
RepToID=RecordID (AutoNumber)
HeadHoncho=PersonID
Subordinate=PersonID

So in this instance, the complete table will look this
RepToID/HeadHoncho/Subordinate
1 1 1 (Joe reports to Joe. Default when adding new person)
2 14 14 (Jane)
3 23 23 (Harold)
4 1 14 (Joe and Jane)
5 23 14 (Harols and Jane)
6 28 28 (Julie is born)
7 14 28 (Julie reports to Jane)
8 42 42 (Mark is born)
9 14 42 (Mark reports to Julie)

There are one area where I see a problem here.
I. If Jane marries two men with the same lastname (Or the same as her birth name) - By looking at their lastnames, you wont be able to tell who it the father.

If you REALLY wanted to, you could add another field - call it say "Sibling" but then things start getting confusing. How far down do you got etc. etc.

Now this may not be the absolute solution, but I hope that it triggers some though juices for you.
 
More worms!

Thanks liddlem for your reply. As I lack the experience, I will run your idea by my friends and see what they think.

"Formatted correctly". Yes one of the things I have been doing is cleaning up the format problems. Data entry by inexperienced people who might add a space after a name for example, or Excel which adds an apostrophe sometimes when "paste special"-ing a formula into values, etc. Very tedious with thousands of records because one have to first find the anomaly.

However, you did open up another can of worms which I have been aware of but was going to wait to ask about. That is dates, and now you have added "Data is split (normalised) one sheet for each table that you wish to create" which I was unaware of.

First, MS seems to think that the world began in 1900, and as a result, many of the Excel sheets that I was given were formatted this way by people who did not have an extension to deal with older dates. In one cell, they formatted as a "date" 9 Sep 2001, with the column width shrunk to show only 9 Sep, and the next cell in the row formatted "general" was 1789, thus to the eye 9 Sep 1789. I chose to parse that into three columns date, month and year formatted "general", thus eliminating completely the date issue of MS programs not knowing the world existed before 1900, and most people not having the extension. (Not to mention my ignorance of how Access will handle dates prior to 1900.)

However, there are many instances of incomplete dates, variously: 1842-3, Sep 1798, probably 1856, 1823?, etc. My GUESS is that all of those incomplete dates would have to be parsed into another field, perhaps "memo" type, with db forms that have date data ultimately set up having four boxes Day, Month, Year, Dirty Data. Does that seem reasonable?

As for the "Data is split (normalised)" are you saying that I will have to split up my worksheet? From my experiments with the Access Wizard, it looks as though I can choose which Excel columns I wish to use for any table. There is the possibility that I will have to put the Names table Primary Key into an adjacent column for each group of Excel columns that I want to use to create a table, but I thought I could use just one spreadsheet. Can you clarify that for me?

Once again, thank you for your input and time, and apologies for my ignorance. I have done a lot of work over the years writing spreadsheet macros etc using Lotus in the '80s long before the days of MS Office, so I do know something about "programming", and I have been looking for a adult ed course on VB and Access, or a good tutorial book on either for this winter. Originally, I was just going to write some macros for Excel, since most of the people I deal with have NO experience with either DBs or spreadsheets, however the more I got into what was required the more it seemed that a db was the better vehicle. In either case, VB seems to be an important thing to learn. Just what I was planning to do in my retirement.
 
You are going to run into SERIOUS date problems because you have legit dates preceding 1-Jan-1900. You CANNOT use standard MS dates for this, which will result in you needing to generate some specialized date functions for something so simple as computing a person's age.

As to whether converting to a DB is good, bad, or indifferent, it is hard to say. Issues that will help you decide are the size of the data liists and what you want to do with the data. Excel sheets drop dead after a certain size. Access is more flexible but also has size limit. On the other hand, it is not uncommon to have Excel sheets be more flexible precisely because the notes you enter are somewhat unpredictable. With Access, you must conform to much stricter formatting rules.

As to the geneology, search the forum for that keyword. Some articles have appeared on that topic with the last few years.

In general, the way to do a geneology is to have two tables - a people table and a relationships table. You put a unique ID (perhaps in Access, an autonumber is adequate) on each person's record. As to what you put in the rest of that record, it depends on how ambitious you get. But here are a couple of thoughts

Pers table: PersID, name parts (First, Last, Middle, Honorific), birth, death, etc.

PersRel table: PersID1, PersID2, RelCode, StartDate, EndDate

RelNames: RelCode, ForwardName, ReverseName

OK, in PersRel you have TWO relationships to the Pers table, one for ID1 and one for ID2. This is permitted in the Relationships window because ID1 and ID2 are two different fields.

In RelNames, the code shown between person 1 and person 2 lets you select the nature of their relationship assuming a DIRECTED relationship. Or you can simplify this by choosing non-directed relationship names.

Directed examples:

RelCode = 1, FWD = "Father", RVS = "Son"
RelCode = 2, FWD = "Father", RVS = "Daughter"
RelCode = 3, FWD = "Husband", RVS = "Wife"

Non-directed examples:

RelCode = 1, FWD = "Parent", RVS = "Child"
RelCode = 2, FWD = "Spouse", RVS = "Spouse"

So a relation record would show the I.D. numbers for Johnathan and Martha Kent, relationships Spouse. Then Clark Kent would have a relationship with each of his parents (one each) that says "Child" (or, technically, "Adopted Child"). Got the idea?

A bit about dates: Since you are dealing with real dates that predate the MS reference date, you need to do some web searching OR you need to decide other ways of how you want to compute ages. The way most systems do this is:

1. Select a reference date.
2. Make midnight of that date = time zero.
3. Make all other dates some integer number of days from that reference. In effect, "map" days to the calendar.
4. Make your algorithm sensitive to leap-year issues. Remember that year 1900 WAS NOT a leap year because of the "divisible by 100" rule; ditto, 1800. BUT year 2000 WAS a leap year because of the "divisible by 400" rule.

You might be able to web-search for some articles on easy ways to compute date differences. Your problem isn't necessarily that the dates can or can't be mapped. It is that they require you to go off the MS map to map them. This means you absolutely cannot use ANY of the MS date/time routines.
 
Thanks Doc Man for your reply.

Yes, I realized early on that dates were going to be a real problem with MS, but it is the standard, however dysfunctional. There is an addin from this site <http://www.j-walk.com/ss/excel/files/xdate.htm> that helped me with parsing the Excel files. When I later checked my Access I found that it also accepts dates prior to 1900, and so the simpliest solution is for me to give a copy of the "xdate" with each copy of the db so that anyone would have it to install on their computer. Perhaps that is looking for trouble, but I could also include a readme.

BTW, age is not really a necessary issue. After all, most people dealing with geneology are perfectly capable of subtracting one year from another.

My only alternative to the xdate seems to be entering each date as three numerical entries. Sadly, that requires 3 fields instead of one for each date. One can only wonder what possessed MS from doing away with Julian dates.

In "A bit about dates" you say "web-search for some articles on easy ways to compute date differences". One of the functions of the XDate addin is "XDATEYEARDIF(xdate1,xdate2): Returns the number of full years between two dates (useful for calculating ages)". It seems that this would fulfill "require you to go off the MS map to map them", or do I misunderstand you?

As to the "size of data lists", I am up to 50 fields at present and climbing especially if dates have to be handled as 3 fields each, and, or course, the problem with a spreadsheet is all the empty cells. DBs, in my understanding, are much more efficient at use of memory.

Thanks, I will search for "geneology". If I'm really lucky, someone will have solved this issue already.

As for the suggestions on relationships. Thanks muchly. I was pursuing a similar idea, and you have given me an alternative approach to it. I will try your suggestion as I think it is a bit simpler, more elegant.
 
I was actually looking for other information when I came across your discussion. This site has some interesting history into the calendar generally and the Microsoft problem in particular. It might give you some ideas on how to cope with your dates issues.

http://www.fourmilab.ch/documents/calendar/
 
Great Webpage, but

Thanks RexesOperator for you reply.
This is a great page, but at the bottom are the Excel and Mac paragraphs, and that analysis shows why trying to deal with dates in any format other than as numbers makes it difficult to exchange data with others.

My problem is that the db has to be set up in such a way that people unknown to me and perhaps 50 years from now will be able to have the data in a form that is without need of some type of special programming, add-ins, or any type of manipulation that is not custom to Access. Sure, I could try to couple some freewear with each file, but that is always problematic. People tend to lose things, especially people who are either computer novices, or at least Access novices. (I was just given a scanner by someone, but they lost the AC adapter to power it! Fortunately, Radio Shack had a generic one, but we had no way of knowing the polarity of the DC input plug, i.e. as the salesman said, "Well, ya got a 50-50 chance of blowing it up.")

After much thought and conversations with programmers and archivists, we decided that 3 separate numbers would be safest and least problematic, and as of this day, Microsoft has not, to my knowledge anyway, been able to screw up positive numbers. If people can't convert 3 17 1864 into 17 Mar 1864, they have a problem, but not caused by the incorrect format of data.

Anyway, thanks again for your post and the page.
 

Users who are viewing this thread

Back
Top Bottom