Strategies for optimizing a complex search

tranchemontaigne · Jul 17, 2009

I have a problem and am looking for some advice on a best approach with respect to performance.

PROBLEM:
I need to have MS Access perform a complex search
- I have a master list of records with about 120K records.
- Each record contains about 40 fields.
- 5 fields contain free text.
- Each of these free text fields needs to be searched using an InStr (regex-type search)
function for matches.
- The criteria passed to the InStr function comes from another table that contains several thousand records
- Additional fields (e.g. dates) need to be evaluated to limit search results
SOME OPTIONS
1) Write a straight ANSI-92 SQL statement using ?LIKE ?*

.[FIELD] criteria with unjoined tables (what I like to refer to as a quasi-Cartesian product)
2) Create a recordset object for each table, and evaluate the primary recordset with values from the criteria recordsets
3) Create a recordset for the primary table, and arrays of criteria populated with values from the criteria tables
4) Create a 2-dimentional array (one array with record_Ids from the primary table, and a second dimention to store record contents
5) Create a structure that stores the contents of the primary data table, and use arrays to hold the criteria used to evaluate structure items
6) Other strategy

________
Moraine Assembly

The_Doc_Man · Jul 17, 2009

When doing this search, IF there is never a competitor, I might have an idea.

If you have a search that many people are running at one time, my idea won't work.

What you do in the stand-alone case is add one more field to your table. Have it as a Yes/No field that says "Eligible."

Before you start your search, run a really dumb Update query to set every record to "Eligible=True"

Now, for each query, do an update of Eligible = FALSE if the record fails the criteria of that leg of the search. Make the update work only on records that still have Eligible as TRUE. Each search should go faster than the previous leg of the search.

Now, as to the order of searching, always do first that search that removes the most records. The word "Cardinality" applies here. It is defined as the proportion of records expected to be returned by the average SELECT query.

Examples of the extremes:

For a record with a unique key, cardinality is 1 regardless of the number of records.

For records with a yes/no key (i.e. Male/Female), cardinality is .5 x number of records in the recordset.

For items randomly distributed among numbers 1-10, cardinality would be .1 x number of records. And so on.

What you want is to pick the most restrictive search first, meaning the one with the smallest cardinality when compared to the number of records.

When your searches are done as updates, the records that match still have Eligible = True, and there is your last search - a SELECT of eligible records.

Special case exception: When some of the fields are keyed and others are not, and you can also eliminate based on full comparison of keyed fields, do them first because they are faster and will reduce your selection list faster.

If multiple people can do this at once, this approach won't work because of multiple people trying to set the same flag differently each time.

tranchemontaigne · Jul 20, 2009

Thanks! This is helpful. The caveat about a multi-user app is well noted. I'm wondering what you have to say about the following workaround:

(1) dynamically generate an "eligible" column name based upon logged in user and date-time stamp
(2) execute an ALTER TABLE...INSERT COLUMN statement using this column name
(3) execute a CREATE INDEX statement against that new field
(4) follow the method you suggested using a series of UPDATE statements
(5) prepare and present search results
(6) execute an ALTER TABLE...DROP COLUMN statement against the "eligible" field created in step (2)
________
Weed Vaporizer

The_Doc_Man · Jul 20, 2009

Dangerous. Mucking about with table structure would be extremely nasty if two people tried to do it at the same time. VERY BIG data locks involved.

You'd almost do better to define a local table of all the possible record keys (the Prime Key) and the Eligible flag, build a JOIN QUERY of this table to the main (shared) table, and write something that drops the local-table records that have become ineligible. The same strategy of filtration would apply as to order of searching, but you would at least have less to worry about with the shared search. The local copy of the shared table PK and Eligible flag is NOT shared, and what you would really do is just UPDATE the local shared flags with the JOIN query but delete the ineligible record references with a local-only DELETE query. Yeah, I know, two steps per search leg, but as the local table gets smaller and smaller, the searches should become faster and faster. Further, what you would have in that case is the PK-based JOIN, so that step, clunky though it is, would be the fastest possible JOIN available given the circumstances.

It is also that very rare case, a situation where a one-to-one join actually makes sense. It isn't REALLY a one-to-one join, but that's what you'll tell Access it is. Because Access doesn't care that it is multiple one-to-one joins, many users to one table. Access isn't doing anything to the shared table that requires you to consider that it is technically a many-to-one case.

Let's be honest, Access is a really great product for flexibility but it has its limitations when working with huge data sets. So nothing you can do that is a pure Access solution is going to be blindingly fast.

Strategies for optimizing a complex search

tranchemontaigne

Registered User.

The_Doc_Man

Immoderate Moderator

tranchemontaigne

Registered User.

The_Doc_Man

Immoderate Moderator

Similar threads

Users who are viewing this thread