Solved Relational Table or Spreadsheet? (1 Viewer)

Privateer · Jun 27, 2022

I am being asked by a client to do something that I already know is wrong, but it is nagging at me that they might have a point. So, I was hoping that others here could weigh in with their opinions and give me some advice on how to alleviate the client's fears.
They are asking me to identify documents using three text fields, like type and subtype and unit. If there is a match, the document must be checked further, does it contain a key phrase or not, and depending on that result, include or exclude the document from further consideration. One of the keyword phrases is 164 characters long and the other three range from five to fifty. The source of this information comes from an internal database to which we do not have access, but we can run a "report" that exports the information into Excel that I then import into our database.
The nagging point is that I believe the initial three fields and the keyword field, which appear quite uniform, were probably selected with a combo box in the source database rather than typed in by hand. This almost makes them unique enough to qualify as a foreign key and if I index these four text fields a search could be accurate and fast enough to work. There are less than 200 records in this table, but they could be compared to as many as 5000 in the imported table. The other option is to build four new tables and generate primary keys that I must back fill into the importation table, based on the text, and use those numbers for the downstream code. And I must also include a query that adds any new types or key phrases into those four tables on every import.
The code is not the problem here, it's the phrase, normalize till it hurts and de-normalize till it works. Do I really want to take on the responsibility of maintaining the primary and foreign key fields when these text fields might get the job done? Then again, I can't see myself writing an If statement that begins IF (164-character string) equals (164-character string) then do this or that. The client is fighting me every step of the way because they have had issues normalizing information with the prior developer and any attempt to calm their fears is falling on deaf ears. I realize this is not a normal question for this forum, but I would appreciate any helpful information. Thanks.

plog · Jun 27, 2022

It doesn't sound like you are building a database but performing a data task. Complete the task in the most efficient way possible.

Is this an ongoing process or a one time thing? If one time, just get it done. If an ongoing process, that's when you think about maximizing its performance for the long term. That maximization may or may not require normalization.

Privateer · Jun 27, 2022

It's an ongoing process. There can be as many as twenty people importing a different spreadsheet each day with documents to inspect. I think I am going to normalize it now and see if the next step in the process will work as well when comparing text strings verses numbers. Thanks for your input.

Pat Hartman · Jun 27, 2022

Searching for a string within a string does not lend itself to the use of a foreign key. This is one of those awful processes for which there is no good solution. Google does this type of search with great efficiency. Modern relational databases have produced tools to facilitate text searches. Jet/ACE, which you are probably using as your BE has been frozen in time at around whatever was possible as of 1992. For 5000 records, I'm not sure I would spend a lot of time trying to find an optimal search mechanism but if the file has the potential for growth, at some point, the simple " LIKE with wildcards " method will be too slow to be useful.

Here's a link that might give you some idea on what would be possible if you loaded the data into SQL Server. You could do that for just this purpose. It wouldn't require converting your application. You could just load the two tables, search, make your decisions and continue with your current process in your current BE.

Full-Text Search - SQL Server | Microsoft Docs

Privateer · Jun 28, 2022

Wanted to let you know how this turned out. I normalized the three text fields, type and subtype and unit into their respective tables and updated the import table with the primary keys. I am also checking for new values in the import table that don't exist in the normalized ones and doing an append query. Using those keys as inner joins between the (Target) import table and the (Source) table that has the string I am looking for, I was able to line up the Source and Target fields on the same row in a query. Then I just used an immediate IF and the InStr() function, similar to InStr(1,Target,Source,1), which returns zero if there is no match and a long integer representing the position where the match starts. So, IIF(InStr(1,Target,Source,1)>0, it's a match, otherwise no match. Pat, thanks for the link, but it was your suggestion about SQL Server that reminded me of the InStr function, that I think is relatively new to Access, but one I used when I worked in SQL. Definitely didn't want to go the wildcards route, would have been very ugly, so thanks again for your reply.

Pat Hartman · Jun 28, 2022

InStr() has been in Access since the beginning. It is the InstrRev() that is the more recent addition.

Actually, LIKE with wildcards might be faster than InStr(). But it seems that you have somehow narrowed down the search to one record at a time so maybe not.

gemma-the-husky · Jul 4, 2022

Searching a few thousand records won't be critical, I wouldn't have thought. Hundreds of thousands might be.
I would see how long it takes with text matches, and if it's acceptable, I would be inclined to leave it.

I don't think you can assume anything about the data in the remote system, especially when you have no control over it. Users may have copied and pasted long strings, but I doubt they would have selected long strings form a drop down, for instance.

In passing, note that instr() is by default NOT case sensitive. I have absolutely no idea how the same lower case and upper case character can be matched in that way, but there you go. BY the same token, I have no idea how access can instantly convert a string of characters into a unicode string with 2 bytes per character, but it seems to be able to. I spent ages trying to get back from a unicode string to a non unicode string without success.

Solved Relational Table or Spreadsheet? (1 Viewer)

Privateer

Registered User.

plog

Banishment Pending

Privateer

Registered User.

Pat Hartman

Super Moderator

Privateer

Registered User.

Pat Hartman

Super Moderator

gemma-the-husky

Super Moderator

Similar threads

Users who are viewing this thread