A most challenging task: Extracting data from 1000 word documents and 30000 pages

Ammarhm

Beginner User
Local time
Yesterday, 21:14
Joined
Jul 3, 2008
Messages
80
[FONT=&quot]Hi All!
First lots of thanks to this wonderful forum and to anyone who would be able to help me in this difficult task
Here it comes:
I am now working on my PhD thesis, in one project I have received a CD with about 1000 separate word documents. Each document is about 30 pages long
In each of those documents there are data about one participant in my study (so in total 1000 documents for 1000 participants)
The data in the documents are entered in two ways: there are either "Text form fields" or just normal text
So the task is to try to import data from those 1000 documents to a Db by means of VBA or a Macro without needing to do the job manually
When it comes to Text form fields, I have found this very interesting site that you might find interesting too

http://gregmaxey.mvps.org/Extract_Form_Data.htm

It worked as a charm when it comes to form fields
BUT
the problem comes when trying to extract "normal text" ie text entered in word without a form field
I can identify the needed text by searching for a word combination like, the needed text will be the first setting of numbers that come after the sentence "The study subject received a dose of"

So the original text in the file might be:
"The study subject received a dose of 25mg"
"The study subject received a dose of about 25mg"
My macro or VBA should practically search for the string ""The study subject received a dose of" then look for the first numerical after it which in this case should return "25"
Guys, is this doable?? It would really save my life, I mean going through 30000 pages manually would take ages, but if it could be done with VBA then it would only take some fun programming and the click of a button
Appreciate all help
Regards[/FONT]
 
Yes you can do this as I have done when extracting 72000 records from a website. It may not be full proof at the beginning if you have to do this on a regular basis but as the process gets older you can improve its searching capability.

You already know that access can read word documents because its a part of the office program so here's a little how to on to search and retrieve the data afterwards.

Dim iLength as long, iLocation as long, strTmp as string
Then figure out a way to get the text from the word document into strTmp

Then do something like
iLocation = instr(1,strTmp, "The study subject received a dose of")
This will put the location as a number in the iLocation variable.

Now you could use a mid statement to pull certain data out. using your iLocation and use the iLength for how many characters you want to pull after what you searched for.

Hope this makes sense.
 
Ow one last thing to add to make sure you get a number and not any other text at the beginning of the extracted data use a do loop that checks for 1-9 in the first character of your extracted data. If it does not then remove the first character and try again until you do. It just makes your results turn out better if you do this with the rest of what I had just mentioned.
 
Hi
Well, I would not say that I am the most experienced VBA programmer, I know a bit, and I can look through the internet on how to use those commands
I know I should not ask others for easy solutions, but you wouldnot happen to know where I could find a similar VBA code that I could modify to my purpose
Thanks
 
Perhaps one thing to look into too is RegEx
I'm not very good at those but a guy I work with uses them all the time for this sort of thing. You can do some pretty sophisticated find and replace with them that can save a lot of VB code

Here is a link to a tutorial on them http://www.regular-expressions.info/
 

Users who are viewing this thread

Back
Top Bottom