Name maze (1 Viewer)

Kundan

Registered User.
Local time
Yesterday, 18:46
Joined
Mar 23, 2019
Messages
118
I have a table with the following fields:
CR NO
DT
FNAME
LNAME
ADD
AMOUNT
In this CR NO is Pimary key.
I want to find those people whose amount totals 50000 and above.
The prob is that the names are typed haphazardly, For example Jack Jill Thompson
Jack J. Thompson
J.J. Thompson & Family
How should I write a query to achieve the result?
 

CJ_London

Super Moderator
Staff member
Local time
Today, 01:46
Joined
Feb 19, 2013
Messages
16,553
with difficulty. Even if you match two records with similar names, how do you know they are the same person? What you should have is a separate table with names and other personal details (address/phone/email etc) then this table would have just had a foreign key rather than your two name fields.

So I would make the object of your exercise to create this table and modify your design so it is not a problem in the future. It is basically the same exercise as you need to now.

How many records are you talking about? I suspect you will need to do multiple queries and visual inspection to resolve.?

Suggest start by creating your person table and adding a personFK to your existing table. The first thing to do is identify the easy matches which you do by linking your table to itself on fname and lname. You can then add those names to your new table, and populate your new FK field with the PK from the new table.

Moving forward you can then ignore these matched records because you only want records where the FK is null

You may have other data which can help you with this task
 

Uncle Gizmo

Nifty Access Guy
Staff member
Local time
Today, 01:46
Joined
Jul 9, 2003
Messages
16,245
If you have a significant number of Alternate spellings, then a "Manually constructed" query is not the way to go. It will drive you mad eventually! I think the approach I would use would be to to create a query which selects the first name last name only, and then restrict this query to unique values, export those unique values into a new table. Add to the New table, a unique index and add another field for the "True" name of the person of "Long" Data type. Now look at your newly created list of names which will contain your alternative spellings:-

1) Jack Jill Thompson
2) Jack J. Thompson
3) J.J. Thompson & Family

add another name which represents the actual name you want for this person, in this case that might well be:- "JJ Thompson"

So now you have a list of names thus:-

1) Jack Jill Thompson
2) Jack J. Thompson
3) J.J. Thompson & Family
4) *****
5) *****
6) *****
Etc) ******

1000) JJ Thompson

And in the extra field you have added against the names 1,2,3 place the new new name number 1000)

1) Jack Jill Thompson 1000)
2) Jack J. Thompson 1000)
3) J.J. Thompson & Family 1000)

Now you can create an update query which will change the each instance of the alternative spelling to the correct spelling you have specified.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 01:46
Joined
Feb 19, 2013
Messages
16,553
further to my last post, it might be an idea to focus on the last name as this is likely to have the most consistent spelling, then convert the first name to initials - although that still won't identify William and Bill as the same. Whatever you do, there will be some manual mopping up.
 

vba_php

Forum Troll
Local time
Yesterday, 20:46
Joined
Oct 6, 2019
Messages
2,884
I want to find those people whose amount totals 50000 and above.
The prob is that the names are typed haphazardly, For example Jack Jill Thompson
Jack J. Thompson
J.J. Thompson & Family
How should I write a query to achieve the result?
based on your words, the fact that the names are typed haphazardly are completely irrelevant. you said you wanted this:
Code:
I want to find those people whose amount totals 50000 and above.
if that's what you want, it is simple:
Code:
SELECT * FROM TABLE WHERE amount >= 5000
if you're talking about duplication in the NAMES field and multiple records entered per name when there should only be ONE per name, you should go to your manager and tell them that their people are not entering data correctly or that people are doing redundant jobs because they don't understand who is doing what. In a case like that, the manager won't be happy you told him, but you will in the end because he or they would realize their error and stop doing it.

if the duplication is justified, and there *should* be more than one record entered per name, the issue about management still stands.
 

MajP

You've got your good things, and you've got mine.
Local time
Yesterday, 21:46
Joined
May 21, 2018
Messages
8,463
To add on to what CJ_London said your data needs to get cleaned up and then the tables need to be turned into

TblPersons
PersonID_PK ' primary key, autonumber
LName
FName
.....
other fields that uniquely describe a person

tblCR
CRNO ' primary key
PersonID_FK ' foreign key to tblPersons
Add ' not sure what this is
Amount

Fix your tables and do some updates queries to clean your data.

FYI, If you clean up the data without fixing your tables you are just wasting your time.
 

cheekybuddha

AWF VIP
Local time
Today, 01:46
Joined
Jul 21, 2014
Messages
2,237
@Adam, I don't think the solution is tat simple.

What happens when you have:
Code:
Name                        Amount
Jack Jill Thompson          20000
Jack J. Thompson            15000
J.J. Thompson & Family      20000

They wouldn't be flagged, but the total will be > 50000.
 

vba_php

Forum Troll
Local time
Yesterday, 20:46
Joined
Oct 6, 2019
Messages
2,884
/They wouldn't be flagged, but the total will be > 50000.
i'm sure they have an answer that gives them what they need. I was basically giving them the answer based off of common sense language that was in their question. a lot of times people don't even know how to ask the right question, isn't that true?

and the situation you're addressing in your post was "sort of" covered in my answer. if that is the case, I would still recommend they talk to management, because the data entry team is probably off-base. either that, or the JACK AND JILL family has gone haywire! :p
 

CJ_London

Super Moderator
Staff member
Local time
Today, 01:46
Joined
Feb 19, 2013
Messages
16,553
I would say a number of times, people don't spend the time to understand the question and/or fail to ask for clarification (happens to me occasionally) - please don't demean the OP by implying he/she doesn't know how to ask the question.
 

Kundan

Registered User.
Local time
Yesterday, 18:46
Joined
Mar 23, 2019
Messages
118
To add on to what CJ_London said your data needs to get cleaned up and then the tables need to be turned into

TblPersons
PersonID_PK ' primary key, autonumber
LName
FName
.....
other fields that uniquely describe a person

tblCR
CRNO ' primary key
PersonID_FK ' foreign key to tblPersons
Add ' not sure what this is
Amount

Fix your tables and do some updates queries to clean your data.

FYI, If you clean up the data without fixing your tables you are just wasting your time.

Thanks for your advice GOD BLESS YOU!!!!!
 

Kundan

Registered User.
Local time
Yesterday, 18:46
Joined
Mar 23, 2019
Messages
118
To add on to what CJ_London said your data needs to get cleaned up and then the tables need to be turned into

TblPersons
PersonID_PK ' primary key, autonumber
LName
FName
.....
other fields that uniquely describe a person

tblCR
CRNO ' primary key
PersonID_FK ' foreign key to tblPersons
Add ' not sure what this is
Amount

Fix your tables and do some updates queries to clean your data.

FYI, If you clean up the data without fixing your tables you are just wasting your time.

After cleaning the table, what should be the query to extract the required data(i.e. Persons whose amounts total up tobe =>50000 ?
 

CJ_London

Super Moderator
Staff member
Local time
Today, 01:46
Joined
Feb 19, 2013
Messages
16,553
something like this
Code:
SELECT FName, LName, Sum(Amount) as Total
FROM tblPersons INNER JOIN tblTrans ON tblPersons.personPK=tblTrans.personFK
GROUP BY FName, LName
HAVING Sum(Amount)>=50000
 

Kundan

Registered User.
Local time
Yesterday, 18:46
Joined
Mar 23, 2019
Messages
118
something like this
Code:
SELECT FName, LName, Sum(Amount) as Total
FROM tblPersons INNER JOIN tblTrans ON tblPersons.personPK=tblTrans.personFK
GROUP BY FName, LName
HAVING Sum(Amount)>=50000

Thanks a lot GOD BLESS YOU!!!!
What should tblTrans have ?

To straighten the messy database which is,
Name Amount
Jack Jill Thompson 20000
Jack J. Thompson & Others 15000
J.J. Thompson & Family 20000

I have to first make all Thompson, Thompson & Others, Thompson & Family to Thompson.

Can we use a query or code for it?

Further, the First name of Thompson with same address has to be made identical. Can it be done through code or query?
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Yesterday, 20:46
Joined
Feb 28, 2001
Messages
27,001
I would imagine that if you have non-standard naming of people that you might also encounter non-standard naming of streets. Like people might write Wall St. and Wall Street for street names. Therefore, deciding that your Thompsons have the same address would also be a bit tricky. Maybe not quite as bad as people names, but street names can be badly written, too.

This sort of name management - whether people or streets - is certainly possible in a set of queries. It can also be done in code. But here is a question for you. How many records in this table? It might not be a huge number of people, but it seems that it can be a large number of transactions per person or family. If so, the problem only grows more difficult.

If it is just a few hundred, it might be practical to write some code and expect it to run with reasonable speed. If it is in the over 100,000 range, code will take a long time to run. Worse, having a larger table at least implies a chance of having more different names for people to get wrong. Either way, the amount of time to implement this might be formidable. And doing it by queries would run faster than code, but you would still need a large set of queries for each family of names.

In your original description, you show fields FNAME and LNAME. I can see that folks might abuse those fields by inputting something over and above the intended values of First Name and Last Name - as for example Thompson & Family, You are not going to be able to write ANY query to achieve your desired results until you do some data conditioning. This might require restructuring your table OR creating a parallel table as a staging area. How much flexibility do you have in attacking this problem? I ask because some of our members tell us "I can't change the structure because it comes from an external source" or things like that. Is there a way for you to either modify this table or add a working table?
 

Kundan

Registered User.
Local time
Yesterday, 18:46
Joined
Mar 23, 2019
Messages
118
I would imagine that if you have non-standard naming of people that you might also encounter non-standard naming of streets. Like people might write Wall St. and Wall Street for street names. Therefore, deciding that your Thompsons have the same address would also be a bit tricky. Maybe not quite as bad as people names, but street names can be badly written, too.

This sort of name management - whether people or streets - is certainly possible in a set of queries. It can also be done in code. But here is a question for you. How many records in this table? It might not be a huge number of people, but it seems that it can be a large number of transactions per person or family. If so, the problem only grows more difficult.

If it is just a few hundred, it might be practical to write some code and expect it to run with reasonable speed. If it is in the over 100,000 range, code will take a long time to run. Worse, having a larger table at least implies a chance of having more different names for people to get wrong. Either way, the amount of time to implement this might be formidable. And doing it by queries would run faster than code, but you would still need a large set of queries for each family of names.

In your original description, you show fields FNAME and LNAME. I can see that folks might abuse those fields by inputting something over and above the intended values of First Name and Last Name - as for example Thompson & Family, You are not going to be able to write ANY query to achieve your desired results until you do some data conditioning. This might require restructuring your table OR creating a parallel table as a staging area. How much flexibility do you have in attacking this problem? I ask because some of our members tell us "I can't change the structure because it comes from an external source" or things like that. Is there a way for you to either modify this table or add a working table?

The rec count is around 10,000. I can make a copy of the existing DB and make changes to the structure with out affecting the original.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Yesterday, 20:46
Joined
Feb 28, 2001
Messages
27,001
I know of no fast way to do this. I see it as a tedious "divide and conquer" type of problem. It would start with an analysis of the LNAME field.

First, run a query like this: UPDATE table SET LNAME = TRIM$(LNAME) ;

Create a query to show you likely cases needing attention. You said sometimes the names contain "& family" or variants thereof. So find them.

SELECT FNAME, LNAME FROM table WHERE INSTR( 1, "&", LNAME ) <> 0 ;

That will give you a list of names where you want to get rid of the "& family." If you don't see any cases where you would want to keep the "& family" then you could try this:

UPDATE table SET LNAME = LEFT$( LNAME, INSTR( 1, "&", LNAME ) ) ;

This will remove the offending suffixes. I would next look for cases that contained " and " using the same logic. ( "Thomson and family")

Note that after any update query that could remove some text, you MIGHT wish to clean up the aftermath with the previous "TRIM" query.
UPDATE table SET LNAME = TRIM$( LNAME ) ; since there might have been a space BEFORE the part you removed.

One possible indication of the need for a cleanup would be the presence of a blank in the middle of a name field. If you have run the "TRIM" query then any blanks that remain in a name field are in the middle of it and such fields are candidates for editing.

You will also want to work on the FNAME field in a similar way, and as always you would need to assess what needs to be done in specific.

Once you clean this up, if there is any way for you to control the data entry, set up a test during data entry for the name containing spaces or other problems. As long as you don't intercept bad names during data entry, you will never be rid of this problem. And if YOU can't do it then you seriously need to talk to whomever is involved to let them know that bad data entry is costing your company time and money by requiring data cleanup.
 

MajP

You've got your good things, and you've got mine.
Local time
Yesterday, 21:46
Joined
May 21, 2018
Messages
8,463
The divide and conquer is the right approach, but if it was me doing this I would have a form to assist in this. In other ways a fast way to search for some records and a fast way to then reassign them to a new name or person ID. I did something similar with system specifications. The top continuous form was the higher level specs and the lower form was the unassigned specs. So like your problem it took knowledge of the information in order to determine things were related or the same thing. There was also 1000s. I had lots of different search features like Doc Man describes and then a way to click the found specs and assign to a top level spec. It still took many hours, but without this UI would have taken weeks if even doable at all.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Yesterday, 20:46
Joined
Feb 28, 2001
Messages
27,001
To be honest, the hard part is deciding what to look for next. Once you know what you are going to fix, some variant of what I described - or what MajP described - will be easy. Like the old Bugs Bunny cartoon, "To make hasenpfeffer, first you've got to catch the rabbit."
 

Users who are viewing this thread

Top Bottom