Extract data from word documents into Access (1 Viewer)

Cotty42

Registered User.
Local time
Today, 15:27
Joined
Feb 27, 2014
Messages
102
Hi All

Not sure if this is possible but I am trying to extract data from a word document to set up a database.

Basically I am trying to capture data from completed forms, similar to the sample attached, the actual blank form is 20 pages long and once completed can be as many as 30 pages, or even more, although the format doesnt change (i.e. individidual cells will expand to fit the data in the cell).

I want to be able to scan through the completed forms and extract the data i.e. Full Study Title, Short Study Title, Study Type etc. into a database.

The issues:
1. Each document will have a different name but will be stored in the same location.
2. What is on page 8, for example, in one document is not necessarity going to be the same on every document (due to expanding cells)
3. Each sector is in a separate table but tables can spread across several pages.
4. Some data is stored in a checkbox format rather than text.
5. I dont want to extract all of the data, only certain sections (at least at this stage).

Any help from the guru's will be very welcome.

Thanks

Dave
 

Attachments

  • Sample Participating Site New Study Submission Package.doc
    80 KB · Views: 313

Anakardian

Registered User.
Local time
Today, 16:27
Joined
Mar 14, 2010
Messages
173
One option is to use the form fields throughout instead of only for checkboxes.
There is also a text variety you can use.
It is possible to use VBA to extract the contents of form fields and load it into a table for further processing.
Since each form field is individually named, their position in the document does not matter.

It is also possible to read the content of a cell in a table however the way your form is constructed means that the heading will also be included in the extract. Not a big problem but it can be annoying if someone modifies a heading by mistake.
You would use a table / row / column reference to extract the data so as long as no one inserts or deletes a row or column you don't have to worry about the page number.
 

Cotty42

Registered User.
Local time
Today, 15:27
Joined
Feb 27, 2014
Messages
102
Thanks Anarkadian

I have come up with the following code to try this out before I embark on the full project.

Code:
Public Function ExtractWordData()
    Dim appWord As Word.Application
    Dim doc As Word.Document
    Dim fd As FileDialog
    Dim strDocName As String
    Dim blnQuitWord As Boolean
    On Error GoTo ErrorHandling
 
    BrowseforFile = MsgBox("Do you want to browse for a file?", vbYesNo, "New Trial")
    If BrowseforFile = 6 Then
        Set fd = Application.FileDialog(msoFileDialogFilePicker)
        With fd
            .AllowMultiSelect = False
            .Title = "Browse to Select a File"
            If .Show = -1 Then strDocName = .SelectedItems(1)
        End With
        Set fd = Nothing
    End If
 
    Set appWord = GetObject(, "Word.Application")
    Set doc = appWord.Documents.Open(strDocName)
 
    txtFullTitle = doc.Tables(1).Rows(2).Cells(1)
    txtShortTitle = doc.Tables(1).Rows(6).Cells(1)
    txtProjectRef = doc.Tables(1).Rows(2).Cells(2)
               
    doc.Close
    If blnQuitWord Then appWord.Quit
    cnn.Close

Cleanup:
    Set rst = Nothing
    Set cnn = Nothing
    Set doc = Nothing
    Set appWord = Nothing
    Exit Function
ErrorHandling:
    Select Case Err
        Case -2147022986, 429
            Set appWord = CreateObject("Word.Application")
            blnQuitWord = True
            Resume Next
        Case 5121, 5174
            MsgBox "You must select a valid Word document. " & "No data imported.", vbOKOnly, "Document Not Found"
        Case 5941
            MsgBox "The document you selected does not contain the required form fields. " & vbCrLf & "No data imported.", vbOKOnly, "Fields Not Found"
        Case Else
            MsgBox Err & ": " & Err.Description
    End Select
    
    GoTo Cleanup
End Function

Which runs to the line
Code:
   txtFullTitle = doc.Tables(1).Rows(2).Cells(1)

Then returns the error:- "5991: Cannot access individual rows in this collection because the table has vertically merged cells"

I assume this means that the format of the tables within the document is not compatible with this type of data extraction. Is there any way around this.

Cheers

Dave
 

Anakardian

Registered User.
Local time
Today, 16:27
Joined
Mar 14, 2010
Messages
173
The easiest would of course be to unmerge the cells.

Is the layout fixed or can you make changes?
 

Cotty42

Registered User.
Local time
Today, 15:27
Joined
Feb 27, 2014
Messages
102
Unfortunately its fixed.

I am trying to import historical forms into a new database so have got about 500 of these to do, hence the desire to automate it as much as possible.

I think the issue will only apply to table 1 which has one cell in column 1 with 4 cells in column 2.

All other cells seem OK.
 

Anakardian

Registered User.
Local time
Today, 16:27
Joined
Mar 14, 2010
Messages
173
This one has me stumped at the moment.

Of course you nee the information in that table otherwise you would not have received the error but since it appears to be a one time job, consider that time spent on making the code to transfer the data vs. the time needed to enter it manually.

One thing that may work though is to split the cell into the required parts that will allow it to read the contents. Basically unmerging the cell before you try to read it.
The data should not change but it may throw off your current code as you are adding cells.

There is no need to save teh document after you are done reading so any changes will not matter if they make the job easier.
Always copy the files to a different location though, shit sometimes happens and you can always delete the files once they have been read.
 

Cotty42

Registered User.
Local time
Today, 15:27
Joined
Feb 27, 2014
Messages
102
Hi Anarkadian

Thanks for your help with this.

I have found this code from Gary Mayor on the answers.microsoft.com web site, which seems to work.

Code:
Dim oTbl As Table
Dim oCell As Cell
For Each oTbl In ActiveDocument.Tables
    For Each oCell In oTbl.Range.Cells
        With oCell
            'do stuff
        End With
    Next oCell
Next oTbl

The only issue is it works through every table / cell in the document which takes quite a bit of time. So I want to limit it to only table 1, which ius where the vertically merged cells reside. The rest of the document I can handle with absolute cell references.

Any idea how I can do this.

Thanks again

Dave
 

Anakardian

Registered User.
Local time
Today, 16:27
Joined
Mar 14, 2010
Messages
173
Have you tried something like this:
[FONT=&quot]
[/FONT]
Code:
Dim oCell As Cell
  With ActiveDocument.Tables(x)
      For Each oCell In oTbl.Range.Cells
          With oCell
              'do stuff
          End With
      Next oCell
End With

I have not tested the code but as long as you replace the x it should do what you need.
 

Cotty42

Registered User.
Local time
Today, 15:27
Joined
Feb 27, 2014
Messages
102
Thanks again Anakardian

With the following mod that worked fine

Code:
Dim oCell As Cell
  Set oTbl = ActiveDocument.Tables(x)
   For Each oCell In oTbl.Range.Cells
        With oCell
            'do stuff
        End With
   Next oCell

Just need to figure out how to extract data from checkboxes now.

Cheers
Dave
 

stopher

AWF VIP
Local time
Today, 15:27
Joined
Feb 1, 2006
Messages
2,396
Thanks again Anakardian

With the following mod that worked fine

Code:
Dim oCell As Cell
  Set oTbl = ActiveDocument.Tables(x)
   For Each oCell In oTbl.Range.Cells
        With oCell
            'do stuff
        End With
   Next oCell

Just need to figure out how to extract data from checkboxes now.

Cheers
Dave
Cells in a table are a collection (hence your loop) in the Range object. So in a 4x3 table there will be 12 cells. If two cells are merged (first to cells in the first row) then there are only 11 cells and the first cell is the merged cell. The second cell is the cell after the merged cell etc.

So you can reference the fifth cell in the table like this :

oTbl.Range.Cells(5).Range.text

the cells are numbered left to right, top to bottom so the table can be as complex as you like but as long as you know the number of the cell you can reference it (which you can either count or list by code).

hth
Chris
 

Cotty42

Registered User.
Local time
Today, 15:27
Joined
Feb 27, 2014
Messages
102
Finally figured out how to extract the check box data.

I have posted it here to help any one else with the same issue.

Code:
    Dim oTbl As Table
    Dim oCell As Cell
    Dim oCBox As FormFields
    Dim strFFName As String
    
    Set oCBox = doc.FormFields

' ***** Loop for every formfield in the word form
    For i = 1 To doc.FormFields.Count
        strFFName = doc.FormFields(i).Name
        strFFCBox = doc.FormFields(i)

' *****  Sets variable CBoxTrue if the checkbox is selected on the word form, otherwise set to false
        If oCBox(strFFName).CheckBox.Value = True Then
            CBoxTrue = True
        Else
            CBoxTrue = False
        End If
            
        ' do stuff
                
    Next i

Hope this is of use

Cheers

Dave :)
 

Users who are viewing this thread

Top Bottom