Question Where do I go from here

ihchealth

New member
Local time
Today, 00:58
Joined
Dec 30, 2012
Messages
4
As I said in my introduction to this site ([FONT=&quot]Wandering in the dark)[/FONT], I have not used Access since the mid 1990’s. I understand the principles of a relational data bases design and the importance of data normalization.​


Here is my dilemma:

I have a large number of excel spreadsheets ( appx. 125) that have been used to record data from clients that have been treated in a diabetes prevention program. These spreadsheets have been copied, renamed and used by others to add their data. There are some spreadsheets that have 128 columns. This is the result of adding new data to an original spreadsheet creating an enormous flat file. To compound the problem the multiple copies of the original spreadsheets do not all contain the same data.


What is the best way to sort all this information into clean non duplicated data that can be used in an Access database? Should I try to clean up the data in the spreadsheets first then import it into Access or would it be easier to import the raw spreadsheets and manipulate the data in Access?


I have very little experience with Visual Basic. I have seen on this site that it is often recommended as a solution for manipulating the data. What is the learning curve for me to be able to use that programming language?

Prior to dealing with the raw data I started to design my Access database. I have attached a copy of my first pass on a normalized database that will meet our needs. I am not sure if I have carried the normalization too fare, making my model too complicated. You will notice that all personal information about a client is separated from the metrics by a junction table which allows the sharing of metrics without revealing any personal information. This is a federal law (HIPAA) that requires that medical record privacy be guaranteed.


I have attached a copy of my Access design with only the empty tables and no forms or queries.

Any help or advice would be appreciated as I am not sure how to proceed.​
 
What is the best way to sort all this information into clean non duplicated data that can be used in an Access database? Should I try to clean up the data in the spreadsheets first then import it into Access or would it be easier to import the raw spreadsheets and manipulate the data in Access?
There are no rules in this. It depends on how crappy your data is, and where your skills are. I'm pretty good at code in Access so I like to pull data into temp tables, and clean it up there. Maybe you are better in Excel. No matter how you slice it though, this is tedious work.

I have very little experience with Visual Basic. I have seen on this site that it is often recommended as a solution for manipulating the data. What is the learning curve for me to be able to use that programming language?
The learning curve depends on what you already know. If you have a ton of experience writing code in a different language, there is some syntax for you to learn and the curve is shallow. If you've never written any kind of code before, the curve will be steep.
Prior to dealing with the raw data I started to design my Access database. I have attached a copy of my first pass on a normalized database ...
... I have attached a copy of my Access design with only the empty tables and no forms or queries.
There is no attachment to your post.

Hope this helps,
 
Agree with Mark and all his points. Have one of my own:

What is the best way to sort all this information into clean non duplicated data that can be used in an Access database?

Define 'used'. I've seen a lot of people come here with an Excel background hoping to move their data to Access for no other reason than to move their data to Access. To them Access is Excel, But Better, but they don't know why.

You used the word 'normalized' and other database concepts which leads me to believe you are not one of them. But you never described an overall goal of working with this data. Whats the end game? Reports? Ongoing data capture?
 
Thank You to MarkK and plog for responding to my first post.
Your comments and advice were well received. I know I have a lot to learn but with your advice I hope to succeed.

I am not quite sure how you insert my questions as quotes when you respond. I have copied and pasted all the FAQ's into a word document which will serve as a reference to get me through the early phase of learning how things are done on this forum. For this post I will just cut and past.

Quote:
[FONT=&quot]There are no rules in this. It depends on how crappy your data is, and where your skills are. I'm pretty good at code in Access so I like to pull data into temp tables, and clean it up there. Maybe you are better in Excel. No matter how you slice it though, this is tedious work.[/FONT]
I knew that it wouldn't be an easy job to cleanup the data. My main reason for the question was, should I take the time to learn VBA because it is more powerful and would permit me to sort out the data more effectively or does it make more sense to figure out how to use the Access import and append functions to break apart the excel files into to individual tables?

Quote:
[FONT=&quot]There is no attachment to your post.
[/FONT]
[FONT=&quot]
I attempted to attach [/FONT][FONT=&quot]my initial design but I must have done something wrong. I will make another attempt to attach the Access.
[/FONT]
Quote:
Define 'used'. I've seen a lot of people come here with an Excel background hoping to move their data to Access for no other reason than to move their data to Access. To them Access is Excel, But Better, but they don't know why.
[FONT=&quot]
You used the word 'normalized' and other database concepts which leads me to believe you are not one of them. But you never described an overall goal of working with this data. Whats the end game? Reports? Ongoing data capture?
[/FONT]



[FONT=&quot]The data is so fragmented into many Excel table that it is impossible to bring together this historical data to evaluate changes over time to support the effectiveness of the program. Also as I write this we are generating new data which should be captured in an Access database instead of more Excel files. We need to use the data as it is being generated and be able to compare it against the historical data.[/FONT]


Thanks again for help and advice.

[FONT=&quot]
[/FONT]
 

Attachments

Overall, good table structure. Here's somethings I see though:


Customer Side:

1. You overnormalized. This is actually a good thing, it means you understand it and can apply it, you just went overboard. You shouldn't have tbl_StreetSfx, tbl_Cities, tbl_ZipCodes. Just put the actual data into tbl_Addresses and don't use foreign keys to these unnecessary tables.

2. I'd consolodate tbl_Phone and tbl_EmailAddresses and their respective _Type tables. They have the basic same structure, just make the Email data fit into the tbl_Phone table.

Client Side:

3. I think a lot of the Client data should be in the Contact table. I mean, aren't birthdate, gender and ShoeSize part of the Contact's record? Seems like an arbitrary spliiting of data (maybe more overnormalization)

4. Don't store the birthdate in 3 seperate fields. There's a Date/Time field type for a reason, use it instead of storing that data seperately.

5. You shouldn't have multiple paths between tables. I can trace 2 paths between tbl_Clients and tbl_TimePoint. It shouldn't be like that. This usually means you have a field in a table that shouldn't be, possibly need a new table to sit between 2 existing tables, or are using a lookup table (see below #6).

6. More overnormalization. Another general rule is that you shouldn't have tables with only 1 real field of data in them. Autonumbers aren't real data (they exist only to help the database), so all those tables on with only 1 real field should go (tbl_MetricType, tbl_TimePoint, tbl_Coaches, etc)


Survey Side

7. Duplicate paths. I can trace 2 paths between tbl_SurveysType and jtbl_Answer & tbl_QuestionType.

8. Not really sure how this entire section is to work together.
 
Plog, as usual, offers good advice.

My thoughts are as follows. . .
should I take the time to learn VBA because it is more powerful and would permit me to sort out the ...
Yoda would say this...
Yoda said:
Do or do not. There is no 'should.'
VBA is handy, but programming is not a trivial thing to learn. So if you are doing a kitchen reno, should you hire a guy to tile the backsplash, or should you learn how to do tile? The answer to 'should' always comes down to what YOU want, not what someone else can recommend.

Your ERD, IMO, has too many M:M relationships, like clients to labs, clients to coaching, clients to classes, etc... I would find it more likely to see a tblLabs that looks like . . .
tLab
LabID (PK)
ClientID (FK to client)
TypeID (FK to LabType)
DateTime
Result
. . . and get rid of tables TimePoint, and jtbl_ClientLabs

Also, a cool trick if you are going to have tons of types that don't change much, put them all in one table, and then write a bunch of queries, like imagine a table called tNameValue . . .
tNameValue
NameValueID (PK)
Class
Name
Value
Description

. . . and then in the Class field have values like Metric, Lab, Coaching, Question, and then you can define all your type tables in ONE TABLE, using the Class as the "TypeOfType" specifier, if you see what I mean. Then can write a qnv_Metric query that selects ONLY the class rows from tNameValue that have the Metric class. This makes it super simple to make a UI (just a single form) to edit ALL your types, system-wide. :)
 
Thanks for the advice:)

I am anxious to implement the design changes and see what you think.

Just so you know I only work two days a week and and often have other issues to deal with. I am not permitted to take any of the data home to work on it because of HIPPA regulations. The only thing I can work on at home is basic DB design. I will create some dummy data to test my design changes.

Is it too early to start designing forms? If not, I am thinking about converting some of the paper forms to Access as they are familiar and require less learning to be used by the staff. With that in mind is it possible to use the MS Word documents and insert data fields where the hand written information is entered?

I once used FileMaker Pro and it had this capability.

ihchealth
 
Is it too early to start designing forms?

Yes. Forms are the last piece to work on. Database development should work like so:

1. Tables
2. Reports
3. Forms

You build your structure, make sure it outputs the data you need in the manner you need and then you work on ways to get the data into the system via forms.
 
I wouldn't use Word for data entry. Forms in Access gives you a far superior way to error check and generally control the user's options.
Is it too early to start designing forms?
Not sure how far along you are. I would likely refine the tables a little more first, but also, in some cases you'll be using the UI (forms) before you realize that you have a table design problem. I find the overall development process to be fairly chaotic, so design some forms if you want. See if you feel like it's an advantage. Experiment, learn.
 

Users who are viewing this thread

Back
Top Bottom