Ideas for cross reference table structure

Isaac · Apr 14, 2021

I have a small 'reconciliation' project where I am going to need to routinely compare two datasets. I have zero authority, influence or control over the data structure of these two data sets. Worse yet, they don't have any sort of Key values, and my task will be to compare them on a text field which is common to both. I'll call it TextField, from Table1 and Table2.

To complicate things to the point of absurdity, almost, in some cases a related record has one particular value in Table1.Textfield, and a slightly different value in Table2.Textfield. (Of course all this implicates a major redesign of these systems, or a new one to replace them, but I do not have that option right now - my job is only to create a widget for monthly ingestion of the two datasets and some reconciliation reports, pointing out discrepancies on other columns, and using Textfield to link them).

I'm able to create a process now (at the beginning of this adventure!), for the business to tell me "whenever you see Value1 in Table1, it should be considered the same as Value2 in Table2". Approved pairings, so to speak.

I'll need to incorporate this new intelligence to perform joins of some kind in the future. For example, Table1.Textfield might say "Western US Division" and Table2.Textfield might say "US Division of the West". The business will "start me out" by letting me know that those 2 should be paired as if they are the same.

We're talking about pretty small data here. A few hundred records at a time in any one of the 2 tables.

I'm trying to think of the best way to structure this new 'intelligence' on 'pairing' from the business and unsure which would be theoretically best or most scalable/efficient.

Should I maintain a table with just 2 columns.....showing Table1.Textfield and Table2.Textfield values? If so, how in the world would I join anything to this?

Or should I create a table with 3 columns....the approved pair, and a third meaningless key value.....and then create a process whereby, during my once-a-month ingestion of these tables' data, each of the two incoming tables gets represented with their source data, plus a new column: TextFieldNew--and TextFieldNew contains the meaningless key value - thus paving the way for a normal join thereafter?

This task is working with very small data, but my mind wants me to come up with a textbook-sound process anyway.

Feel free to provide any ideas you might do in this scenario?

plog · Apr 14, 2021

Should I maintain a table with just 2 columns.....showing Table1.Textfield and Table2.Textfield values? If so, how in the world would I join anything to this?

This is the way. Let's call it JoinTable and it has the fields Key1 and Key2 which represent Table1.Textfield and Table2.Textfield respectively. It will hold every match between those 2 imported tables--even if they are the exact same value.

Then to JOIN them in a query you bring all 3 tables into a query, JOIN Table1.Textfield to JoinTable.Key1 and then Table2.Textfield to JoinTable.Key2

Now, every month when you get new data you will need to run 2 queries before doing whatever it is you are going to do. 1 query will find values in Table1.Textfield that are not in JoinTable and 1 query will find values in Table2.Textfield that are not in JoinTable. That will identify new values. You will need to populate JoinTable with those new values to ensure the new data finds its way into your results.

Isaac · Apr 14, 2021

That makes sense plog. Thank you for the input.

Minty · Apr 14, 2021

+1 for @plog suggestion - the only way to go.
I think the technical term is a mapping table?

Isaac · Apr 14, 2021

Ok sounds good. I guess I've been lucky...either designed my own tables or worked at a sql shop where it was all pretty much by the book, so not sure I've used one before

Ideas for cross reference table structure

Isaac

Lifelong Learner

plog

Banishment Pending

Isaac

Lifelong Learner

Minty

AWF VIP

Isaac

Lifelong Learner

Similar threads

Users who are viewing this thread