Searching through files

tsp813

New member
Local time
Today, 15:08
Joined
May 31, 2001
Messages
5
Here is my problem. My office has a large selection of files, and they want to have a nice interface where they can search keywords through the files. The simple solution to this would be to find a program that indexes the files and searches through them. But there is a problem with doing that. Some of the files are saved as PDFs and cannot be searched. So we ordered a program that converts the PDFs to word files. I've made the conversions, but the DOC files are not even close to carbon copies of the PDFs, but the words for the most part are there.

My idea, which leads to the question of whether or not this can be done, is to possibly put all of the files into an access database as OLE objects. This way, when I get results, with some simple programming I could display a link to the PDF file instead of the DOC file where the words were found. But I'm not sure if searching through objects in the database is possible. And, even if it is, would it be slow? And also if that is the case, can it be indexed to make it faster, seeing that I'm estimating it would be about 100 to 200 MB of documents?

My gut feeling is that you cannot search through the objects in the database, but I figured I'd give it a try. But if it can be done, can anyone offer some suggestions on exactly how?

Thanks for any help,
Tim
 
Bringing this back to the top. I promise I will only do this this one time, but I'm just desperate for an answer, so if anyone has any suggestions or comments please reply.

Thanks,
Tim
 
Tim, the problem is whether the source format of the files makes its elements known.

There is a concept known as the Component Object Model, which you can look up through Help files or the web. If the program publishes its components using COM standards, you can directly open the item - using application objects (also a Help title) - to search its contents.

Therefore, I will state categorically that it is possible to search a Word file using Access facilities. Word complies with the COM standards. It makes its paragraphs, words, tables, and other features available through a series of collections. The structure of these collections is always the same, though the CONTENT of the collections of course varies with the document. Access, if properly set up with references to the Word object library (see References in the Help file), can do this search using VBA string and application-object functions.

I admit a limited bit of knowledge here. I have never had a case where I needed to test a .PDF for compliance to the COM standards. So I don't know if it will work or not. But I would bet you dollars to donuts that Adobe won't make the COM data available if you only have the Acrobat Reader rather than the full package. My advice to you is to find out if you have the full Adobe package at your site. If so, look into Adobe's Help facilities to see if they address the topic of COM exposure or some similar phrase.

Other forumites reading this: If you know whether Adobe supports COM, please chime in.

Sorry I don't know the exact answer, Tim. But maybe at least this will help you understand the direction you have to go.

I will add that it might not be such a bad thing to have to import the .PDF files to form a .DOC file for the sole purpose of searching. Then you could drop the document like a hot potato once you have it completely indexed - just keep the .PDF around. If Adobe doesn't publish their structures using COM, that might be the only way to do this.
 

Users who are viewing this thread

Back
Top Bottom