Solved Key Violation Error Trapping During TransferText Operation

Pat Hartman · Mar 8, 2024

ebs17 said:
Allowing and even ignoring errors is an expression of efficiency?

YES when the db engine is performing the same validation.

ebs17 said:
But what about data integrity, like full imports?

We are talking here about forms not duplicating what the database engine is doing in ONE very specific case.

ebs17 said:
How can you be sure that errors only occur for which the database machine's own error trapping is sufficient?

You are taking something I said and extrapolating it to situations where I never suggested that errors be ignored. You are acting like I said to ALWAYS IGNORE ALL ERRORS and that is NOT what I said.

We are talking about importing a file in a situation where the import may contain duplicates. We are not talking about anything else. I don't know if we are having a language issue or not. But you are not understanding why I even suggested such heresy. That is what you might want to dwell on - the heterogeneous join. That is what makes the technique that eliminates the error up front so expensive. And that leads us to my suggestion to simply ignore this particular error rather than run the query with the heterogeneous join. Then I explained a very specific case where I don't ignore this particular error and why. So at no time did I ever say ------ ALWAYS IGNORE ALL ERRORS. That you are making up in your own mind and putting words in my mouth. My point is that I/O is the most expensive operation your code performs. Therefore, you should always avoid unnecessary I/O. What constitutes necessary I/O is the question here and my position is that when the database engine is going to save you from yourself, you don't ALWAYS need to validate ahead of time to simply prevent the user from getting a message. You should use your common sense and eschew performing I/O in cases where it is especially expensive - which it is in this specific case because the I/O involves a heterogeneous join. Clearly if we're talking about 12 records here nobody cares.

Jet/ACE allows the append to succeed after ignoring the errors. SQL Server does not. That may be what is confusing you. If you are using Access methods or DAO, you can just ignore the errors and life is good. The records that fail validation are NOT appended. The rest are.

We never got to the point of discussing how to identify which rows failed the import but I can explain that also if we can ever get past this one.

ebs17 · Mar 8, 2024

Pat Hartman said:
We are talking here about forms not duplicating

No, the topic was import from text file into table. Where does a form appear and what kind? At best, I need a button click to invoke the import action.
If you leave this film and go to another film, you can of course tell a different story. If you do something like that, you should also mention it EXPLICITLY.
It is not convincing to accuse others of lack of clarity and stupidity when you yourself travel through the world as a ghost driver.

Pat Hartman said:
We are talking about importing a file in a situation where the import may contain duplicates. We are not talking about anything else.

I would never be so presumptuous as to assume that there could be exactly one error that I have to deal with. You would never need general error handling anywhere, you would get along just fine with the one more famous you know.

In my mind it's about importing from an environment with a weaker standardization into a database table (or several tables) with a higher standardization. There are systematic problems there.
There are certainly a lot of topics in this forum where it wasn't enough to simply press the standard button when importing from a text file and everything was done. No, there are many opportunities to encounter errors.

"We are talking about" => no, you are having your own conversations elsewhere.

Pat Hartman said:
My point is that I/O is the most expensive operation your code performs.

In practice, the proposed additional testing action will only need to be questioned if it produces NOTICEABLE extensions of the runtime. I have never experienced anything like this in my practice.
Well, importing from a 2GB file would certainly require some different considerations. In terms of the technique required, it will make a difference whether I pick up an earthworm or an elephant.

Pat Hartman · Mar 8, 2024

As I said. I can explain that I/O is the most expensive operation a program does. But you don't have to believe me or even understand the statement. Check for yourself. I'm sure you can find timings done by someone that tell you the cost of a set statement or an identity or an If or a select. I/O - no matter what the source, I/O is the most expensive. Then you can find some "expert" to tell you how expensive heterogenous joins are. Then maybe you can focus on the question and understand why in this specific case, it is OK to allow the database engine to catch the duplicates.

If you want to pick out individual errors, you cannot run an append query - with or without the heterogeneous join. You have to run a VBA loop and attempt to add one row at a time. Then you can validate each record before you attempt to append it.

Your solution BTW only dealt with duplicates and it did it with a heterogeneous join but now you have changed the argument to "all types of errors". I never even said I would never do this join to prevent DUPLICATE errors if you recall. The OP was confused by your syntax so I explained that he could just link to the table and didn't need to use your suggested syntax. There was nothing inherently "better" about it. Just KISS.

ebs17 · Mar 8, 2024

Pat Hartman said:
Your solution BTW only dealt with duplicates

Code:

db.Execute "AppendQuery", dbFailOnError

The execution also deals with other possible errors. RunSQL or OpenQuery does not appear there.

Pat Hartman said:
You have to run a VBA loop

Isn't that a recommendation regarding efficiency?

Pat Hartman said:
why in this specific case, it is OK to allow the database engine to catch the duplicates

However, it takes extra effort to limit the import to this specific case.

Pat Hartman said:
I'm sure you can find timings done by someone that tell you the cost of a set statement or an identity or an If or a select.

I would be happy about that, but I'm rather skeptical, simply because the execution plan process is different than my SQL statement.
I have never seen convincing measurement tests in this direction.

Pat Hartman · Mar 8, 2024

And what does the Access execution plan do? Select all rows from SQL Server table. Select all rows from linked text file. Join locally using RBAR. Append the resultset. Feels like a pretty heavy cost compared to just an Append from the linked table.

Can you use a pass through query to join the SQL Server table to the text file? I don't even know if SQL server supports heterogeneous joins. If it can, that execution plan would be to import the text file into a temp table then join using RBAR. Less work than Access would have to do. I did mention in passing importing the text file into a server side temp table. You are still still left with the RBAR join but maybe adding an index to the temp table can make the join more efficient.

I'm done. I have stated my opinion and provide my reasoning based on 50+ years of working with relational databases and code. I am familiar enough with the relative cost of individual operations that I would always avoid I/O as long as that didn't jeopardize data integrity, that unless the requirement was to identify all errors and report on them, I would simply do the Append and catch the error. If the user wanted to know which specific records got rejected, I would add a batch number to the process and the append query would include the batch number. That way, it is easy to pull out the newly added records and compare them to the source table so you can identify which specific records didn't get appended for some reason.

ebs17 · Mar 8, 2024

Pat Hartman said:
SQL Server table

Where does this suddenly come from?
If I want to communicate efficiently with SQL Server, T-SQL and T-SQL methods are required. Then an MS Access method such as TransferText and Jet-SQL is out of consideration, at least not the first choice.

My experience of less than 50 years teaches me to know what I am touching before selecting methods for necessary actions.

Pat Hartman · Mar 8, 2024

ebs17 said:
The execution also deals with other possible errors. RunSQL or OpenQuery does not appear there.

It doesn't separate them. All you know is some rows failed to append for "whatever" reason.

ebs17 said:
My experience of less than 50 years teaches me to know what I am touching before selecting methods for necessary actions.

So, since you never believe a word I say - PROVE it to yourself exactly the way I did years ago for myself. Create your own test cases. You wouldn't believe mine anyway. You don't believe Collin's query timings either so I'm not offended. It must be difficult to never be able to learn from others.

You have to use loops to run the instructions thousands of times to get an accurate timing. You can go so far as to PROVE to yourself that when you have a Select Case or a series of If's that if you put the most likely true value first, the instruction runs faster. Pick the most common 3 (or whatever). Place them first, then alphabetize the rest for ease of reading.

ebs17 · Mar 8, 2024

Pat Hartman said:
It must be difficult to never be able to learn from others.

That is a very arrogant statement.
Any expert will confirm that you should also have a healthy amount of residual suspicion about their statements.
My experience - yes, that does exist - shows me which experts I follow almost unconditionally and, on the other hand, where I differentiate based on statements.

Pat Hartman · Mar 8, 2024

ebs17 said:
That is a very arrogant statement.

You may be too advanced to learn from me but you are also too advanced to learn from Collin. I never claimed to know everything about everything but I have a pretty good handle on what I do know and if I tell you that I know something,the odds are pretty good that I do. If I tell you unequivocally that I/O is the most expensive operation a code module ever performs (I'm not talking about calling a UDF that does something stupid, I'm talking about native instructions), you should at least have enough respect for my opinion to stop and think about what I am trying to explain. You might ask - how do you know that? And I would tell you about the application I built to calculate relative times for all COBOL instructions. Granted, VBA is not COBOL, but the languages are very similar and where they overlap, there would be equivalence. And if you think about what that means, you would come to the exact same conclusion yourself. Obviously I/O is more expensive. I shouldn't even have to tell you that. It should be intuitive. How much more? Well, I don't know in this specific case (it was significant in COBOL) but it is easy enough to figure out should anyone want to bother. Then once you Accept my assertion that I/O is very expensive and therefore to be avoided when possible, you can easily follow my logic as to why I would avoid it in specific situations. I was developing applications that REQUIRED sub-second response so I had to know my tool and how to make it as efficient as possible. Simply being careful about what instructions were inside vs outside a loop was critical.

Granted, the Access environment is nowhere near as demanding as others are but you are still part of an ecosystem. What happens on your PC, stays on your PC and only affects your personal experience and your users because you can write really stupid code that wastes endless cycles. However, I/O, when the files are located on the server or an RDBMS affects EVERYBODY on the LAN to a certain degree. Therefore, if you understand that every operation has a cost, you can keep that in the back of your head as you develop and choose to NOT do operations that will affect others when it is possible to avoid them. You can actively choose to not read the same file twice if you rethink your process. Just today, I explained to someone that processing a loop that affects the loop counter, can be done backwards to avoid all the issues. The OP was reading attachements, saving them to a folder and deleting them. He didn't understand that by deleting them he was affecting the loop counter and therefore skipping half the items. Someone else explained that problem and told him to process the attachments twice. Once to copy and save and once to delete. I told him that processing the attachments from back to front wouldn't mess up the loop counter and so one pass through the loop would work. That might save a half second depending on how many attachments there were but since the interaction was with the Exchange server, that half second that became 1 second affected the speed of the LAN. So BFD. Right. BFD. You exist in an ecosystem. Pay attention to it and everyone benefits. I'm not saying you even need to go out of your way. Just be aware of the world around you and make the choices that minimize your footprint as you go through life. Reuse your paper bags. Keep a steady speed to avoid wasting gas. Compost your yard clippings. Pay attention to what is inside and what is outside the loop. Don't do I/O you don't need to do

ebs17 · Mar 8, 2024

never be able to learn from others

There is a Persian proverb: If you point your finger at others, behold, three fingers point at you.

Pat Hartman · Mar 8, 2024

@ebs17 The reason you don't believe I am an actual expert is because we differ on the use of the QBE. You are a purist. You love the way SQL looks in glorious color. In your mind, the only query is one in beautifully formatted color, simple joins don't exist, they are always subselects and you cannot see any other option. I spent 20 years writing embedded SQL in COBOL in black and white and dreaming of a GUI like the QBE in Access so I could get away from the drudgery of hand writing SQL statements. So, when I discovered Access, I thought I had died and gone to heaven. The drudgery was gone. I could use the QBE to build most of every query. Now, the QBE is a pretty poor tool and it is a never ending disappointment that MS has never bothered to make it better. But, it does eliminate a great deal of the drudgery so I put up with it. It allows me to reuse querydefs rather than having to copy and paste SQL everywhere and then modify it everywhere if it needs changing. There are ways to coerce QBE into being less of a pain but it still doesn't do glorious color with beautiful indents and comments and it probably never will. It doesn't even do find and replace. How stupid is that after almost 30 years? So, you have convinced yourself because I almost never use embedded SQL and you ALWAYS use embedded SQL I don't know anything about SQL. That is your opinion and you're entitled to it

ebs17 · Mar 8, 2024

Pat Hartman said:
and you ALWAYS use embedded SQL

Nonsense.
You like to talk a lot, and now you seem to be trying to hide your tempered arrogance in many words.

You don't have to dispose of waste that you don't produce. Save yourself all the words and assumptions.

If you want to think again: read the title of the topic.

ebs17 · Mar 10, 2024

Efficiency considerations can give a certain feeling, measurements under real conditions provide insights.
Out of interest, I created a test environment and measured various things.

Initial situation: Destination table with 10,000 values (text) with a unique index, plus text files with different content in 11 variants.
t_available10_new10 means the text file contains 10 percent of entries that are in the destination table and 10 percent of new entries.
These 11 variants are executed once with a simple append query (noTest) and once with a built-in check for duplicates (withTest). Because these are append queries, a new copy of the destination table is used each time.

The entire code run with setting up the test conditions took 3 minutes on my computer. This included 3 runs across the 22 variants. Further details can be found in the attached files.

Under these laboratory conditions with the elimination of other errors the queries can be executed WITHOUT dbFailOnError. If the additional test for duplicates is omitted, this results in runtimes of 58.8 percent when only adding new records to runtimes of 365 percent when only appending existing records. This range can be discussed.
According to the measurements, the tipping point for 100 percent runtime is around 50 percent potential duplicates from the text file.

Important for forming my opinion:
- In practice, you rarely have laboratory conditions where you can exclude the occurrence of other errors such as duplicates.
- In my practice there are rarely precise imports from a text file into exactly one table. It is common for the text file to contain content from reports, and this content is to be divided into several tables from the database schema. When it comes to master tables involved, there will rarely be much that is new, but there will often be a lot that is existing.

In summary, I see myself in at least the same or better position by using the test for duplicates.

Now someone show me my errors in thinking or errors in my measurement environment.

The_Doc_Man · Mar 10, 2024

@ebs17 - The question I have to ask is, what did you do with the duplicates when you detected them? (I don't open someone else's databases on my home machine so I didn't look at your code.)

Looking at this from the outside, I see two people talking past each other. Perhaps I'll just tick you both off, but I hope to do otherwise by adding my two cents' worth.

@Pat Hartman is saying (if I read her correctly) that you have two basic problems here, or two parts to the problem. One of the parts is the heterogenous JOIN; the other is the expected duplicate entries. The difficulty to which she APPEARS to have focused her attention is with the expected duplicates.

I don't want to put words in Pat's mouth but I have to agree with her on what happens IN ACCESS when you make your application code ignore errors. Access, regardless of what you do, will do the import row by agonizing row (RBAR). The DB engine, regardless of what you do, will detect the (expected) duplication where applicable. The next question is whether you trapped the error or didn't trap the error. JET/ACE will disallow the insertion no matter what you do in your error trapping routine because a key violation is still a key violation. So with respect to your test, when a duplicate is detected then a record is rejected, and that is a base-line fact. That is the LEAST you can expect to happen. The question then arises - so what ELSE can you do?

There is an implied code context switch in the innards of the TransferText operation any time it has to respond to an error condition, particularly if it engages the error-trapping dialog box. If you are going to continue after the error AND you are trapping, you have just called a special handling no-parameter subroutine out-of-line with the transfer process. This CANNOT be faster or equal in speed than just letting Access drop the bad record and go one its merry way in the loop, because that drop/proceed is base-line behavior.

Therefore, on simple logic alone, I have to disagree with your statement:

ebs17 said:
I see myself in at least the same or better position by using the test for duplicates.

How do you deal with those duplicates after the import is complete? Different question. You can either pre-scan for duplicates (which adds the time of one query equal in size to the import operation) or you can impose some kind of flag during the import operation to show that the record came from the import and then post-scan for non-flagged duplicates. That would add a full-scan query as well. I see a pre-scan or post-scan that doubles the number of records you have to hit OR I see an error handler routine that doesn't always take a straight-through path for its code, and the number of extra trap-based diversions can NEVER make the rest of the loop faster.

Therefore "better position" seems to me to carry some baggage that you have not clarified. What, specifically, makes it better? You aren't going to import more records unless you do something to handle the key duplication. I don't see how that can be faster.

If there is a linguistic situation here, that might account for something - but it doesn't seem so. I just don't see your argument.

ebs17 · Mar 11, 2024

Code:

sSQL = "INSERT INTO " & sTable & " (xWord)" & _
                    " SELECT T.F1 FROM [Text;DSN=MySpec;FMT=Delimited;HDR=no;IMEX=2;CharacterSet=850;" & _
                    "DATABASE=" & sPath & "].[" & sFile & "] AS T"

Code:

sSQL = "INSERT INTO " & sTable & " (xWord)" & _
                    " SELECT T.F1 FROM [Text;DSN=MySpec;FMT=Delimited;HDR=no;IMEX=2;CharacterSet=850;" & _
                    "DATABASE=" & sPath & "].[" & sFile & "] AS T LEFT JOIN " & sTable & " AS D" & _
                    " ON T.F1 = D.xWord WHERE D.xWord Is Null"

These are compared with each other. The second statemant is the one I already wrote in #2 and which has caused some confusion here. SQL is so simple. Do I now have to explain these simple statemant explicitly to avoid wild speculation such as error handling?
Both statements are executed in the test as written with db.Execute sSQL WITHOUT dbFailOnError. The import always takes place against the xWord field, which has a unique index.

In my opinion, the entire import process, which begins with reading the text file and ends with the entries in the table, has been reduced to the core as best as possible. With the connection data to the text file in the query, I save having to link and unlink the text file separately, as described above. Looking at the clock makes mental guesses.

In the first variant, more or fewer duplicates are fired against the index in the table. In the second variant, duplicates are removed using the additional step. The discussions above may lead to the conclusion that the additional testing, whatever it may be, adds a lot of additional effort and is less efficient. I used the required runtime for the entire process as a benchmark because something like this is primarily important for the developer using it as well as for a later user.

What is interesting is that in the process available100-new0, which is all about duplicates, the protection of duplicates purely by the database engine via the unique index performs relatively worst compared to the preliminary check. A 265 percent longer runtime is more than significant.
Now someone can explain to me why this is preferable anyway.

ebs17 · Mar 11, 2024

The_Doc_Man said:
Therefore "better position" seems to me to carry some baggage that you have not clarified. What, specifically, makes it better? You aren't going to import more records unless you do something to handle the key duplication. I don't see how that can be faster.

The subject of the study is an everyday and omnipresent task with no particular intellectual challenges. I don't understand why you don't look at numbers, pictures and experimental conditions and become aware of them.

Then at some point the question arises as to whether you CAN'T understand or you don't WANT to understand.
Individual ideas may be correct in themselves. However, if you have to combine specific processes with other processes, then efficiency and ultimately performance result from the interaction of all these processes.

The_Doc_Man · Mar 11, 2024

There is a linguistic issue based on your post #36. Adding a WHERE xxxx IS NULL clause is, to me, not error trapping. It is a type of filtration. Since filtration is done in-line with other tests, I would agree that you should see minimal difference in speed for the things which pass the filter. To me, filtration as a way to PREVENT errors is perfectly valid. But your choice of calling that "error trapping" confused me because that phrase has a vastly different meaning to me.

I hope that clarifies why I chimed in.

Pat Hartman · Mar 13, 2024

@ebs17 You seem to have spend a fair amount of time to build a test harness. However, it doesn't actually test the problem. When both tables are on the same computer, there is no LAN involved. Therefore data transfer happens at memory speed. I don't have a LAN on which to test this. Your design is also a little complicated to split up so that you end up with tables in the BE on a server (or SQL Server if that is the actual BE) being joined to text files on the PC with the FE.

ebs17 · Mar 13, 2024

Pat Hartman said:
However, it doesn't actually test the problem.

Yes, if you absolutely have to be right, then it will be so. In my opinion, I have encapsulated the problem extremely accurately. Now you have to remember what the problem was and whether we are all talking about the same problem. My task was to import from a text file into a table without errors and without error messages if duplicates were present.

Of course, it's no problem to move the destination tables to your own backend and link them to the frontend - if you can handle tables and backends. Then you can repeat the tests in this sense. However, time measurements should be limited to pure appending processes.
Regarding SQL Server and efficiency, I made the statement that I would prefer not to use Jet-SQL, and with ODBC-linked tables this is used automatically, but rather I would use T-SQL directly with its own import methods. I repeat this because you not only like to ignore errors, but also statements that do not fit into your opinion.

never be able to learn from others

Isn't it time to re-evaluate this statement for yourself and others?

Pat Hartman · Mar 13, 2024

ebs17 said:
In my opinion, I have encapsulated the problem extremely accurately

You've made up your mind.

ebs17 said:
Of course, it's no problem to move the destination tables to your own backend and link them to the frontend - if you can handle tables and backends.

I my world, users can't just move the BE, especially the RDBMS BE to their local computer.

Solved Key Violation Error Trapping During TransferText Operation

Super Moderator

Well-known member

Super Moderator

Well-known member

Super Moderator

Well-known member

Super Moderator

Well-known member

Super Moderator

Well-known member

Super Moderator

Well-known member

Well-known member

Attachments

Immoderate Moderator

Well-known member

Well-known member

Immoderate Moderator

Super Moderator

Well-known member

Super Moderator

Similar threads

Users who are viewing this thread