Splitting pdf files

JohnPapa

Registered User.
Local time
Today, 02:07
Joined
Aug 15, 2010
Messages
1,003
The requirement is as follows: A number of documents (say 10) belonging to different companies get scanned into one pdf file. The user is willing to place on top of every separate document a page where information is stored about the company and the folder that the document will be stored in. It is required to split the pdf into 10 files and based on the information on the first page identify the company and place the document in the correct folder.

From some initial investigation the following are needed,
1) A reference to "Adobe Acrobat 10.0 Type Library"
2) The Acrobat pro version or standard version (not just the reader)
3) An Acrobat API that works with VBA

If you have had some experience with this problem, your help would be appreciated.
 
does the pdf always splits to 1 page or does a document span number of pages.
can you not manually do it.
 
does the pdf always splits to 1 page or does a document span number of pages.
can you not manually do it.
The separate pdf files could be any number of pages. The start of the document is dictated by the first page that the user will insert, which defines the company and the folder where the file will be stored.

I can do it manually with adobe software, but I am trying to automate a repetitive process where the documents of say 10 companies are all scanned at the same time.
 
If it helps to get the ball rolling, is it possible to read a single pdf file with a cover page where the customer and folder to be stored is defined and the remaining pages being the file?
 
Put everything together into one document and then just split it up again and file it separately?
It sounds like: Why make it easy when you can have a complicated process?
With this tool you can edit PDF documents outside of ADOBE.
get scanned
You could also make the scanning process more detailed and intelligent by creating individual documents with meaningful descriptive names. According to the document names, you could then store the documents specifically, combine them into complete documents or do other things.
Putting things together is usually easier than breaking them apart.
 
Put everything together into one document and then just split it up again and file it separately?
It sounds like: Why make it easy when you can have a complicated process?
With this tool you can edit PDF documents outside of ADOBE.

You could also make the scanning process more detailed and intelligent by creating individual documents with meaningful descriptive names. According to the document names, you could then store the documents specifically, combine them into complete documents or do other things.
Putting things together is usually easier than breaking them apart.
Thanks for your reply. Please let me explain a bit further by giving a real scenario. A property management company I do work for, manages 200 buildings. Every month an electricity bill for all 200 building needs to be entered. Yes, they can enter it one at a time specifying where to store the scanned doc, or they can scan all 200 with the cover page for each doc and the doc will be scanned and stored in the correct folder. Believe me it is much faster and the user does not have to think a lot.

Instead of the cover page with the company and folder info some use a QR code.

Will look into the tool you mention.

Thanks

PS It appear that for an automatic solution there is a need to interface with an API from ADOBE.
 
I've had some experience reading values from a fillable PDF form, but your requirement isn't about that. In fact, I am not even sure if the scanned document you have is readable using APIs. You might want to post a sample file for evaluation.

You also mentioned using QR codes. I don't know much about them, but I do know they can be created by code, but I'm not sure if they can be read by code. You usually use a camera to read them.
 
Scanning is done from paper documents. So does that mean that the invoices are printed on paper and then the described procedure should follow?

I would rather consider taking this company's database and using it to automatically generate the required electronic and paper documents.
 
Scanning is done from paper documents. So does that mean that the invoices are printed on paper and then the described procedure should follow?

I would rather consider taking this company's database and using it to automatically generate the required electronic and paper documents.
The Building Management company collects all the bills for each of the Buildings it manages. It is these documents that we need scanned, preferably not one at a time.
 
Can you arrange for the bills to be emailed? If so, chances are the individual pdf can be converted to text and update the values into the system

if the pdf has been scanned then this will not be possible
 
Can you arrange for the bills to be emailed? If so, chances are the individual pdf can be converted to text and update the values into the system

if the pdf has been scanned then this will not be possible
Yes, emailing may be a practical solution, which I suggested to the client. If the scan can be avoided, it would be better.
 
To be clear - I meant emailed by the supplier
 
yes. Assuming you receive the email with pdf attachment, move the pdf to a 'to be processed' folder (this can be automated). The simply run a routine to import to accounts. It takes about a second for each document.

You do need to set things up - I use a little known ,exe to convert the pdf to text and you need to set up the parameters to extract the required information.

My setup form looks like this:
1700220891527.png


and to manage the imports, my import form looks like this
1700221062044.png

import progress is listed in the middle box with import status/error message and any imports that fail for some reason (figures don't add up, supplier of invoice number not found etc) can be swiped left to review. Documents successfully imported are listed at the bottom.

Occasionally suppliers have different formats depending on what is being invoiced or change their design so you may need different setups or need to modify an existing one. I've also noticed some suppliers effectively use 'scan to pdf' to create their pdf - unfortunately these can't be decoded. You would need some sort of character recognition software, but I have not investigated that.

It may be you can use Adobe to extract the data, I've not tried that. And there are plenty of online sites for extracting text from PDF's

The pdf to text exe I use can be found here

Note I use 32bit access, not tried it on 64bit but don't see why it shouldn't work
 

Attachments

  • 1700221025247.png
    1700221025247.png
    31.2 KB · Views: 111
yes. Assuming you receive the email with pdf attachment, move the pdf to a 'to be processed' folder (this can be automated). The simply run a routine to import to accounts. It takes about a second for each document.

You do need to set things up - I use a little known ,exe to convert the pdf to text and you need to set up the parameters to extract the required information.

My setup form looks like this:
View attachment 111014

and to manage the imports, my import form looks like this
View attachment 111016
import progress is listed in the middle box with import status/error message and any imports that fail for some reason (figures don't add up, supplier of invoice number not found etc) can be swiped left to review. Documents successfully imported are listed at the bottom.

Occasionally suppliers have different formats depending on what is being invoiced or change their design so you may need different setups or need to modify an existing one. I've also noticed some suppliers effectively use 'scan to pdf' to create their pdf - unfortunately these can't be decoded. You would need some sort of character recognition software, but I have not investigated that.

It may be you can use Adobe to extract the data, I've not tried that. And there are plenty of online sites for extracting text from PDF's

The pdf to text exe I use can be found here

Note I use 32bit access, not tried it on 64bit but don't see why it shouldn't work
Seems like the way to go. Thanks. I will try it out. Much better than scanning paper. Nowadays the majority of bills is sent electronically.
I still use 32 bit, so no problem.
 
good luck. One part of the process I forgot to mention - pdf's that import successfully are automatically moved to a 'processed' folder - in one case this contains subfolders by supplier so they are easy to find again. On another system I concatenate the supplier name to the pdf name and store in monthly subfolders and yet another I also include the PK of the newly created record. Just depends on the client requirements.
 

Users who are viewing this thread

Back
Top Bottom