insert values where not exists (1 Viewer)

chacham

Member
Local time
Today, 00:57
Joined
Oct 26, 2020
Messages
45
I am using VB to pull some names from a PDF document and write them to the database where they do not yet exist. The query itself is:

SQL:
INSERT INTO Staging(Last_Name, First_Name, Birthdate) VALUES(?, ?, ?)

The parameters are added and the query prepared. This works.

Now i want to change it to only insert the names when they do not yet exist. If i just add a WHERE clause:

SQL:
WHERE NOT EXISTS(SELECT * FROM Staging WHERE Last_Name = ? AND First_Name = ? AND Birthdate = ?);

The execution fails with: Query input must contain at least one table or query.

If i change the VALUES() to a SELECT, there is no table and instead the prepare fails with:

System.Data.OleDb.OleDbException: 'Syntax error (missing operator) in query expression '? WHERE NOT EXISTS(SELECT * FROM Staging WHERE Last_Name = ? AND First_Name = ? AND Birthdate = ?)'.'

How do i insert the records only if they do not yet exist?
 

CJ_London

Super Moderator
Staff member
Local time
Today, 05:57
Joined
Feb 19, 2013
Messages
16,607
One way is to modify your indexes to include a multi key index on the 3 fields and set allow duplicates to false.

otherwise provide your whole sql - ? Means what? And are you surround them with the appropriate delimiters as I don’t see any
 

chacham

Member
Local time
Today, 00:57
Joined
Oct 26, 2020
Messages
45
I did indeed add a primary index to the staging table on all three columns. I don't like the idea of silently dropping errors (already on index) unless that's the only way.

The question marks are oledb parameters. The query object itself replaces them with the appropriate values when running the query. As such, they do not require delimiters.
 

chacham

Member
Local time
Today, 00:57
Joined
Oct 26, 2020
Messages
45
It took me way too long to figure out a solution. I can use the table being inserted into as a guaranteed table the program has access to, and select count(*) to guarantee exactly one record:

SQL:
INSERT INTO Staging(Last_Name, First_Name, Birthdate) SELECT ?, ?, ? FROM (SELECT COUNT(*) FROM Staging)
WHERE NOT EXISTS(SELECT * FROM Staging WHERE Last_Name = ? AND First_Name = ? AND Birthdate = ?);
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 00:57
Joined
Feb 19, 2002
Messages
43,223
I don't like the idea of silently dropping errors (already on index) unless that's the only way.
How is that different from not inserting them? If you run the append query using .execute, you should be able to pick up the counts inserted if that matters to you. I do this for data I import from unreliable sources. I count the rows in the table. I count the rows in the input. I count the rows in the table and subtract the original count. That tells me if everything got inserted or there were duplicates. If you have potential data errors, it gets more complicated.
 

chacham

Member
Local time
Today, 00:57
Joined
Oct 26, 2020
Messages
45
How is that different from not inserting them?

It's stylistic. I am not a fan of exception coding. Also, when i tried it, VB reported an error due to the duplicates.

I count the rows in the table and subtract the original count. That tells me if everything got inserted or there were duplicates.

Assuming there are no other processes modifying the data. It's not guaranteed, but it usually works.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 05:57
Joined
Feb 19, 2013
Messages
16,607
Wasn’t suggesting you had a multi field primary key, just the index
 

chacham

Member
Local time
Today, 00:57
Joined
Oct 26, 2020
Messages
45
Wasn’t suggesting you had a multi field primary key, just the index
There's a difference? I'm not terribly familiar with Access.

I added a PK because i wanted it indexed and guaranteed unique, which is what a PK is.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 05:57
Joined
Feb 19, 2013
Messages
16,607
Agreed, but PK's work best as a single field - if you need to join your table to another table, that other table will also need all three fields.

Assuming you are building your tables in Access using the table designer then click on the indexes option in the design ribbon - you should have something like this

1655248111195.png
 

chacham

Member
Local time
Today, 00:57
Joined
Oct 26, 2020
Messages
45
if you need to join your table to another table, that other table will also need all three fields.

Why be afraid of composite keys? :) I almost never use surrogate keys. Aside from being redundant, they usually end up being confusing and problematic. In any case, this is a staging table, so, there shouldn't be any FKs on it.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 05:57
Joined
Feb 19, 2013
Messages
16,607
Never said I was afraid of composite keys, just saying they don't make good primary keys. I wouldn't expect any FK's in your staging table so I think you are either confused or just being argumentative. Just hope you never have the situation of two people with same name and DOB.
 

chacham

Member
Local time
Today, 00:57
Joined
Oct 26, 2020
Messages
45
they don't make good primary keys

Why not? Personally, i think they make excellent PKs and i always use them.

I think you are either confused

Could be. Fwiw, i read your comment of not using a composite PK as being explained with "if you need to join your table to another table, that other table will also need all three fields." To that i responded that is not an issue in this particular case as it is a staging table. Sorry if i misread you.

Just hope you never have the situation of two people with same name and DOB.

Unfortunately, that's the only information we have from the report. The staging table is to be presented to a user who might import them. If there are duplicates, they might have to dig for other information in other parts of the other system (that is, log in to a website, which takes far longer). In most cases, the name itself is unique enough, and the few cases where it isn't, are resolved by the birthdate.
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 00:57
Joined
Feb 19, 2002
Messages
43,223
Also, when i tried it, VB reported an error due to the duplicates.
That's because you didn't use .Execute to run the query. If you use the Access method, you can turn off the warnings TEMPORARILY. It is imperative that you turn them back on immediately. You don't ever want to leave them off.
I almost never use surrogate keys.
Not sure I would be proud of that. Some tables have natural keys but in most cases those attributes can actually change which makes them poor primary keys. Primary keys should NEVER change. Multi-field keys increase join complexity plus they interfere with using List and Combo boxes which need single field unique identifiers to work correctly. All-in-all, autonumbers eliminate many problems. You just have to use unique indexes to enforce your business rules where you do have candidate keys.
 

chacham

Member
Local time
Today, 00:57
Joined
Oct 26, 2020
Messages
45
That's because you didn't use .Execute to run the query.
Interesting. I used .ExecuteNonQuery(). .Execute() is not a member of OleDbCommand.

but in most cases those attributes can actually change which makes them poor primary keys
I feel like the exact opposite is true. Perhaps we an chalk that up to different experience.

Multi-field keys increase join complexity
I feel like it makes them simpler. It is so much easier to follow joins when the columns that make the join are there in front of you. Furthermore, you can skip going to some tables just to grab one column.

they interfere with using List and Combo boxes which need single field unique identifiers to work correctly.
Didn't know that. I rarely code in Access directly. Thank you for pointing it out!
 

CJ_London

Super Moderator
Staff member
Local time
Today, 05:57
Joined
Feb 19, 2013
Messages
16,607
I accept that in this specific case of a staging table, the argument is academic but whether it is access, sql server or another rdbms, using text as a PK is wasteful as a resource because they take up more bytes. A long as a PK takes 4 bytes, text of say 15 chars takes 32 bytes plus a double for the date takes it up to 40 bytes. Indexing algorithms are very efficient but it will still take longer to process a 40 byte index over a 4 byte index.

And the main thing you do with a PK (other than uniquely identifying the record) is repeat it as a FK in related tables.

Note I am not saying there isn't a case for using text as a PK, just that as a viable option it is limited in scope to be effective - to a low number of chars and/or a low number of records
 

chacham

Member
Local time
Today, 00:57
Joined
Oct 26, 2020
Messages
45
using text as a PK is wasteful as a resource because they take up more bytes.
That is negligible. Processing time is at worst nanoseconds. I remember reading this in a blog who explained this in great detail (was it Richard Foote?). There is basically no point in using a surrogate key for space or speed in most cases. If there ever was such a time, it was decades ago.

That being said, i cannot speak for Access. But it SQL Server and Oracle, it doesn't make a noticeable difference.

the main thing you do with a PK (other than uniquely identifying the record) is repeat it as a FK in related tables.
Wouldn't it be nicer to have the actual text in that other table? I mean, imagine a lookup for Status:
SQL:
CREATE TABLE Status(Id INTEGER PRIMARY KEY, Name VARCHAR(10) UNIQUE, Description VARCHAR(50));
CREATE TABLE Status(Name VARCHAR(10) PRIMARY KEY, Description VARCHAR(50));

The first works, except it doubles the amount of indexes (unique is enforced by an index), and you have to load the actual table to see what the status is, unless everyone memorizes the ids. The second has only one index, and the table enforces all statuses to a list, but there is no need to load the table to see it.

This works with joins as well. When debugging a query to figure out why some record was excluded, i might limit the clauses so i can see the data at a different stage. Using ids often requires going to other tables as well, and makes the debugging stage that much harder.

On a side note, depending on the RDBMS and table type, the main point of the PK can be the organization of the table's rows.

---

BTW, i know we have gotten off-topic. But i love these discussions. It's one of the best ways to learn. And being i do not know Access that well, i really appreciate the time you (both) have taken to educate me.
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 00:57
Joined
Feb 19, 2002
Messages
43,223
Wouldn't it be nicer to have the actual text in that other table?
That's why novices use table level lookups. They need the comfort of seeing the text and they don't know how to write a query with a join.

This discussion was settled at least 40 years ago:) Probably before most of the members were born or at least well before they knew what a PK was.

the main point of the PK can be the organization of the table's rows.
I think that would be the point of a clustered index, not the PK.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 05:57
Joined
Feb 19, 2013
Messages
16,607
Wouldn't it be nicer to have the actual text in that other table? I mean, imagine a lookup for Status:
sorry, but your examples are poor - in the case of status I probably would just store the value, not a FK - or if the status is a simple 'open/closed' it would be a boolean or byte value or slightly more complex 'open/under review/awaiting decision/resolved/closed just a byte. But in the case of a (US) state I would use the state abbreviation as the PK and the full name which can be looked up when required, same for airports, country codes and the like.

But company or contacts details where you have maybe 5 or 6 fields, perhaps more I would use an autonumber or equivalent as a PK.

And in the case of say an invoice requiring perhaps 5 tables - customer, invoice header, invoice line, product, product prices are you really saying you would be using customer name and product name - and what would you use for invoice header, line and prices? Only one I can think you might go for is invoice number for invoice header. So in that chain of joins - how does your comment in post #14 of 'It is so much easier to follow joins when the columns that make the join are there in front of you' work then? sometimes a join works this way, sometimes that....?
 

chacham

Member
Local time
Today, 00:57
Joined
Oct 26, 2020
Messages
45
sorry, but your examples are poor
It was a simple example to illustrate the point and get the conversation going. :)

- in the case of status I probably would just store the value, not a FK
I assume you mean not a surrogate id. Unfortunately, many people would use an id. I have seen this over and over again. Worse, some people teach it this way.

- or if the status is a simple 'open/closed' it would be a boolean or byte value or slightly more complex 'open/under review/awaiting decision/resolved/closed just a byte.
I got bit by that once was we needed to add a "frozen" state to the "active flag". Never again. :) I stick to status lookups even for supposed booleans. As a plus, they can be more descriptive.

But in the case of a (US) state I would use the state abbreviation as the PK and the full name which can be looked up when required, same for airports, country codes and the like.
Ooh, nice example.

But company or contacts details where you have maybe 5 or 6 fields, perhaps more I would use an autonumber or equivalent as a PK.
In that case, i'd use a surrogate id just because the data is not inherently unique, so an id is the only way to guarantee uniqueness.

Only one I can think you might go for is invoice number for invoice header. So in that chain of joins - how does your comment in post #14 of 'It is so much easier to follow joins when the columns that make the join are there in front of you' work then? sometimes a join works this way, sometimes that....?
Yeah, each case is different. Invoices use ids as that is generally the only thing unique about them. Then again, invoice detail may benefit from having the invoice and company (id) making a composite id. Should it further have a child table, you will know the company id without having to go to the invoice table itself.
 

Users who are viewing this thread

Top Bottom