Importing/Parsing from PDF into table (1 Viewer)

cnstarz

Registered User.
Local time
Yesterday, 22:13
Joined
Mar 7, 2013
Messages
89
We publish a PDF file every week full of taskings that our subordinate organizations need to accomplish for the following week. I would like to have parse the PDF and store the data in a table. I think the only way to start this operation is to first save the PDF to Text (Plain) as this helps create some way to delimit and parse the data. So, once it's saved as a txt file, it basically looks something like this:

Code:
//
[B][COLOR=Blue]Column 1 Data[/COLOR][/B]/StuffIDontCareAbout/
StuffIDontCareAbout/StuffIDontCareAbout//[COLOR=SeaGreen][B]Column 2 Data[/B][/COLOR]
StuffIDontCareAbout/StuffIDontCareAbout/
[COLOR=DarkOrange][B]Column 3 Data[/B][/COLOR]
//
[B][COLOR=Blue]Column 1 Data[/COLOR][/B]/StuffIDontCareAbout/
StuffIDontCareAbout/StuffIDontCareAbout//[COLOR=SeaGreen][B]Column 2 Data[/B][/COLOR]
StuffIDontCareAbout/StuffIDontCareAbout/
[COLOR=DarkOrange][B]Column 3 Data[/B][/COLOR]
//
[B][COLOR=Blue]Column 1 Data[/COLOR][/B]/StuffIDontCareAbout/
StuffIDontCareAbout/StuffIDontCareAbout//[COLOR=SeaGreen][B]Column 2 Data[/B][/COLOR]
StuffIDontCareAbout/StuffIDontCareAbout/
[COLOR=DarkOrange][B]Column 3 Data[/B][/COLOR]
//
[B][COLOR=Blue]Column 1 Data[/COLOR][/B]/StuffIDontCareAbout/
StuffIDontCareAbout/StuffIDontCareAbout//[COLOR=SeaGreen][B]Column 2 Data[/B][/COLOR]
StuffIDontCareAbout/StuffIDontCareAbout/
[COLOR=DarkOrange][B]Column 3 Data[/B][/COLOR]
//

...and so on and so forth anywhere between 50-100 more times
The entire text file will always start and end with "//" at the top and bottom. You also see that each tasking paragraph (starting with "Column 1 Data" and ending with "Column 3 Data") is preceded by a "//" on its own line, and also followed by a "//" on its own line. The text file will always look exactly like this with each tasking paragraph having the same number of forward slashes.

Column 1 Data always starts on a new line under "//" and runs until the first "/" (single forward slash).
Column 2 Data always starts after the "//" on the 2nd paragraph line and runs until the end of the paragraph line
Column 3 Data always starts on the 4th paragraph line and runs until the end of the paragraph line

Here's a quick example of what I'm talking about:

Code:
//
[B][COLOR=Blue]37 NOS001[/COLOR][/B]/DCO/
TaskPer/TBD//[COLOR=SeaGreen][B]310001ZAUG2014-292359ZSEP2014[/B][/COLOR]
GenText/Remarks/
[COLOR=Orange][B](U/FOUO) This will contain the actual task description and details.  You can see that "(U/FOUO)" contains a forward slash.[/B][/COLOR]
//
[B][COLOR=Blue]582 NOS012[/COLOR][/B]/DoDIN/
TaskPer/27//[COLOR=SeaGreen][B]280001ZAUG2014-022359ZSEP2014[/B][/COLOR]
GenText/Remarks/
[COLOR=Orange][B](U/FOUO) This another task description and details.  You can see that "(U/FOUO)" contains a forward slash.[/B][/COLOR]
//
So after it's parsed, my table would have the following new rows:

Code:
[COLOR=Blue][B]     Column 1[/B][/COLOR]     |     [COLOR=SeaGreen][B]Column 2[/B][/COLOR]     |     [B][COLOR=Orange]Column 3[/COLOR][/B]
[B][COLOR=Blue]37 NOS001         [/COLOR][/B]|[COLOR=SeaGreen][B]310001ZAUG2014-292[/B][/COLOR]|[COLOR=Orange][B](U/FOUO) This will contain[/B][/COLOR]
[B][COLOR=Blue]582 NOS012        [/COLOR][/B][COLOR=Blue][COLOR=Black]|[/COLOR][/COLOR][COLOR=Blue][COLOR=Black][COLOR=SeaGreen][B]280001ZAUG2014-022[/B][COLOR=Black]|[/COLOR][/COLOR][/COLOR][/COLOR][COLOR=Blue][COLOR=Black][COLOR=SeaGreen][COLOR=Black][COLOR=Orange][B](U/FOUO) This another task[/B][/COLOR][/COLOR][/COLOR][/COLOR]
[/COLOR]

I hope this makes sense and that something able to come out of this. It would seriously make our lives so much easier. Thanks for your help!
 

nae0254

Registered User.
Local time
Yesterday, 20:13
Joined
Sep 6, 2014
Messages
10
I know, I know it is a little floppy.
But when i needed to read a pdf I used the sendkeys like this:
------------------------

In a new file create table TempFactura and a delete query BorroTempFactura.
Don't forget to declare shellexecute:
Declare Function ShellExecute Lib "Shell32.dll" Alias "ShellExecuteA" (ByVal hWnd As Long, ByVal lpOperation As String, ByVal lpFile As String, ByVal lpParameters As String, ByVal lpDirectory As String, ByVal nShowCmd As Long) As Long

Then create a form with a buttom with this code:

DoCmd.SetWarnings False
' Clean TempFactura (erase the data)
DoCmd.OpenQuery "BorroTempFactura", acViewNormal, acEdit (a delete query)
Screen.MousePointer = 11
iret = ShellExecute(Me.hWnd, "Open", Name-of_thePdf, "", "", 1)

Call WaitSecs(3)
SendKeys "^a"
Call WaitSecs(1)
SendKeys "^c"
Call WaitSecs(1)
SendKeys "^q"
' KillProcess ("acrord32.exe") 'close Acrobat Reader
DoCmd.OpenTable "TempFactura", acViewNormal, acEdit
Call WaitSecs(1)
DoCmd.RunCommand acCmdPasteAppend
Screen.MousePointer = 0

' Now you have all the data in TempFactura, with multiple queries You can obtain what you want.
Guillermo
 

Galaxiom

Super Moderator
Staff member
Local time
Today, 13:13
Joined
Jan 20, 2009
Messages
12,851
Read the text as a TextStreamObject . Use the ReadLine method to read it a line at a time in a loop.

Then test the line using functions such as Left(), Right(), Mid() and Instr() to determine that a line needs to be extracted and get the parts required.
 

cnstarz

Registered User.
Local time
Yesterday, 22:13
Joined
Mar 7, 2013
Messages
89
Read the text as a TextStreamObject . Use the ReadLine method to read it a line at a time in a loop.

Then test the line using functions such as Left(), Right(), Mid() and Instr() to determine that a line needs to be extracted and get the parts required.

THIS!

Thanks so much for replying! I will give this a try after reading up on it and tinkering. :D
 

Users who are viewing this thread

Top Bottom