We publish a PDF file every week full of taskings that our subordinate organizations need to accomplish for the following week. I would like to have parse the PDF and store the data in a table. I think the only way to start this operation is to first save the PDF to Text (Plain) as this helps create some way to delimit and parse the data. So, once it's saved as a txt file, it basically looks something like this:
The entire text file will always start and end with "//" at the top and bottom. You also see that each tasking paragraph (starting with "Column 1 Data" and ending with "Column 3 Data") is preceded by a "//" on its own line, and also followed by a "//" on its own line. The text file will always look exactly like this with each tasking paragraph having the same number of forward slashes.
Column 1 Data always starts on a new line under "//" and runs until the first "/" (single forward slash).
Column 2 Data always starts after the "//" on the 2nd paragraph line and runs until the end of the paragraph line
Column 3 Data always starts on the 4th paragraph line and runs until the end of the paragraph line
Here's a quick example of what I'm talking about:
So after it's parsed, my table would have the following new rows:
I hope this makes sense and that something able to come out of this. It would seriously make our lives so much easier. Thanks for your help!
Code:
//
[B][COLOR=Blue]Column 1 Data[/COLOR][/B]/StuffIDontCareAbout/
StuffIDontCareAbout/StuffIDontCareAbout//[COLOR=SeaGreen][B]Column 2 Data[/B][/COLOR]
StuffIDontCareAbout/StuffIDontCareAbout/
[COLOR=DarkOrange][B]Column 3 Data[/B][/COLOR]
//
[B][COLOR=Blue]Column 1 Data[/COLOR][/B]/StuffIDontCareAbout/
StuffIDontCareAbout/StuffIDontCareAbout//[COLOR=SeaGreen][B]Column 2 Data[/B][/COLOR]
StuffIDontCareAbout/StuffIDontCareAbout/
[COLOR=DarkOrange][B]Column 3 Data[/B][/COLOR]
//
[B][COLOR=Blue]Column 1 Data[/COLOR][/B]/StuffIDontCareAbout/
StuffIDontCareAbout/StuffIDontCareAbout//[COLOR=SeaGreen][B]Column 2 Data[/B][/COLOR]
StuffIDontCareAbout/StuffIDontCareAbout/
[COLOR=DarkOrange][B]Column 3 Data[/B][/COLOR]
//
[B][COLOR=Blue]Column 1 Data[/COLOR][/B]/StuffIDontCareAbout/
StuffIDontCareAbout/StuffIDontCareAbout//[COLOR=SeaGreen][B]Column 2 Data[/B][/COLOR]
StuffIDontCareAbout/StuffIDontCareAbout/
[COLOR=DarkOrange][B]Column 3 Data[/B][/COLOR]
//
...and so on and so forth anywhere between 50-100 more times
Column 1 Data always starts on a new line under "//" and runs until the first "/" (single forward slash).
Column 2 Data always starts after the "//" on the 2nd paragraph line and runs until the end of the paragraph line
Column 3 Data always starts on the 4th paragraph line and runs until the end of the paragraph line
Here's a quick example of what I'm talking about:
Code:
//
[B][COLOR=Blue]37 NOS001[/COLOR][/B]/DCO/
TaskPer/TBD//[COLOR=SeaGreen][B]310001ZAUG2014-292359ZSEP2014[/B][/COLOR]
GenText/Remarks/
[COLOR=Orange][B](U/FOUO) This will contain the actual task description and details. You can see that "(U/FOUO)" contains a forward slash.[/B][/COLOR]
//
[B][COLOR=Blue]582 NOS012[/COLOR][/B]/DoDIN/
TaskPer/27//[COLOR=SeaGreen][B]280001ZAUG2014-022359ZSEP2014[/B][/COLOR]
GenText/Remarks/
[COLOR=Orange][B](U/FOUO) This another task description and details. You can see that "(U/FOUO)" contains a forward slash.[/B][/COLOR]
//
Code:
[COLOR=Blue][B] Column 1[/B][/COLOR] | [COLOR=SeaGreen][B]Column 2[/B][/COLOR] | [B][COLOR=Orange]Column 3[/COLOR][/B]
[B][COLOR=Blue]37 NOS001 [/COLOR][/B]|[COLOR=SeaGreen][B]310001ZAUG2014-292[/B][/COLOR]|[COLOR=Orange][B](U/FOUO) This will contain[/B][/COLOR]
[B][COLOR=Blue]582 NOS012 [/COLOR][/B][COLOR=Blue][COLOR=Black]|[/COLOR][/COLOR][COLOR=Blue][COLOR=Black][COLOR=SeaGreen][B]280001ZAUG2014-022[/B][COLOR=Black]|[/COLOR][/COLOR][/COLOR][/COLOR][COLOR=Blue][COLOR=Black][COLOR=SeaGreen][COLOR=Black][COLOR=Orange][B](U/FOUO) This another task[/B][/COLOR][/COLOR][/COLOR][/COLOR]
[/COLOR]
I hope this makes sense and that something able to come out of this. It would seriously make our lives so much easier. Thanks for your help!