Use PK to kick out duplicates

TKnight · Mar 30, 2010

Hi,

I am new-ish to SQL server (2005) and I would like to do something that I did fairly regularly in Access but I can't seem to figure out in SQL - use a Primary Key (or index) to reject certain data being inserted into a table.

With Access, if I had 100 records and a few, say 20 would break the PK, I could run an append query (with warnings off) on the entire 100 and the 80 "valid" records would be inserted and the 20 bad records would be left behind. Nice and simple.

With SQL server however, the insert statement won't execute at all because of the constraints. Is there any way I can get SQL server to replicate Access functionality in this area? (Preferably without building a complex set of queries to filter the data myself, or iterating through each record)

Seeing it written down like that, it does look like bad DB practice and it is probably a dirty habit I picked up from cutting my teeth in Access... but if the functionality is there - why not use it?!

Cheers,

Tom

TKnight · Mar 30, 2010

Oh I got it:

IGNORE_DUP_KEY ON

If anyone else stumbles across it!

dportas · Mar 30, 2010

The IGNORE_DUP_KEY option isn't generally a good idea. For one thing the results are unpredictable. If you attempt to insert multiple rows with the same key then one arbitrarily selected row will be inserted and the rest silently ignored.

In most cases the desired result is to prevent the update with an error when duplicates are included. But the IGNORE_DUP_KEY option is a permanent feature of an index. You can't turn it off without recreating the index. That's a serious hindrance for any future code developed against the table. I suggest that instead you use an INSERT ... SELECT query with a WHERE clause to filter out the duplicates to get the result you want.

georgedwilkinson · Mar 30, 2010

Really bad idea.

There was also no good reason to do it in Access, even if it was available. This is the type of behavior that gives Access a bad name.

You should consider the use of artificial (aka surrogate) primary keys at the very least, along with unique keys on the current "primary" key(s). This technique works in all DBMSs (including Access). In fact, it is highly encouraged in all the beginner Access materials I've seen and even by the built-in Access wizards themselves.

dportas · Mar 31, 2010

georgedwilkinson said:
You should consider the use of artificial (aka surrogate) primary keys at the very least, along with unique keys on the current "primary" key(s).

In what way would that help? You would still want some method of removing the duplicates when doing the insert - the surrogate key won't do that for you. You would then have two keys instead of one, which isn't necessarily a bad thing but it doesn't actually solve any problem it just adds some extra complexity: another index, another column.

georgedwilkinson · Mar 31, 2010

If, for ANY reason, a developer is considering overwriting something that is supposed to be unique, whether it is because of a bad design, a misunderstood requirement, or the fact that we just don't know enough about some classes of data to define what is really unique, you STILL need a unique key to identify a record of that class of data. That is why so many of us who have been doing this for so long use and promote the use of surrogate primary keys.

From my understanding, the OP is trying to usurp what they at one time thought was a unique key. It sounds like the OP is not saying he doesn't want to disallow duplicates, he just wants to be selective about allowing duplicates. This points to a bad design or unknown requirements (for whatever reason; we can't tell with the information provided) and a surrogate primary key will at least allow the system to uniquely identify a record if/when they decide to corrupt their "unique" key. That is why the wizards in Access are so emphatic on trying to add surrogate keys.

I have read the OP a couple of times and it seems like, every time, they are trying to usurp their unique key. What the OP seems to be saying is that they don't REALLY have a unique key, because they try to break it.

BTW, just for the record: a unique index (in Access), unique key, or primary key will ALL prevent duplicates of the columns identified within the key. No DBMS, including Access and SQL Server, will allow duplicates within the columns of the constraint for any of these types of constraints.

I hope that answers your question clearly.

dportas · Mar 31, 2010

georgedwilkinson said:
I hope that answers your question clearly.

Only to the extent that it explains a different set of assumptions from mine. The way I read it the OP is not trying to break or usurp the key constraint at all. He specifically says he wants to enforce it and that therefore he needs to make sure he only inserts a selection of rows which do not violate the key. Clearly a surrogate key won't enforce that same constraint and so I don't see that it would help at all.

georgedwilkinson · Mar 31, 2010

You know what, now that you mention it, I re-read the post again and I see it the same way. That pretty much invalidates everything I've said in this thread, at least for this problem.

Thanks for setting me straight.

Use PK to kick out duplicates

TKnight

Registered User.

TKnight

Registered User.

dportas

Registered User.

georgedwilkinson

AWF VIP

dportas

Registered User.

georgedwilkinson

AWF VIP

dportas

Registered User.

georgedwilkinson

AWF VIP

Similar threads

Users who are viewing this thread