Convert PDF to Doc (1 Viewer)

Freshman

Registered User.
Local time
Today, 19:12
Joined
May 21, 2010
Messages
437
Hi all,

I'm using the code below to convert all pdf files in a folder to txt (in bulk).
The metode below uses MS Word to do the job but since I have hundreds of files to convert I was wondering if there might be a faster way.
I don't want to use one of the many online/website option since I need to have it in code in my application since it will be an ongoing and automated project.
I had a look at JavaScript but there you also have to first upload the pdf file before converting them.
I need something for local files for which the routine below is perfect but just slow.

Thanks a lot

Code:
Public Sub convertPDFtoText()
Const filePath As String = "C:\PDFTest\"
Dim file As String, FileName As String
Dim myWord As Word.Application, myDoc As Word.Document
Set myWord = New Word.Application
file = Dir(filePath & "*.pdf")
myWord.DisplayAlerts = wdAlertsNone
Do While file <> ""
    FileName = Replace(file, "pdf", "txt")
    Set myDoc = myWord.Documents.Open(FileName:=filePath & file, ConfirmConversions:=False, Format:="PDF Files")
    myDoc.SaveAs2 filePath & FileName, FileFormat:=wdFormatText, Encoding:=1252, lineending:=wdCRLF
    myDoc.Close False
    file = Dir
Loop
Set myDoc = Nothing
Set myWord = Nothing
MsgBox "Done", vbInformation, "Notice"
End Sub
 

MsAccessNL

Member
Local time
Today, 18:12
Joined
Aug 27, 2022
Messages
184
I used a free tool called xpdfreader. I can give you a copy if you can't download it.
 

Access or E

New member
Local time
Today, 12:12
Joined
Aug 29, 2022
Messages
12
How would you control the xpdfreader from within Access with vba to convert pdf to txt?
 

MsAccessNL

Member
Local time
Today, 18:12
Joined
Aug 27, 2022
Messages
184
I have problems posting code, i get a warning about spam..
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 11:12
Joined
Feb 28, 2001
Messages
27,162
I have problems posting code, i get a warning about spam..
If you have links in the code, that warning is normal. But it gets better after you reach a certain number of posts. I think the rationale is that if you haven't been kicked out past that threshold, we let you do a few more things. The moderators don't control that, by the way. The site owner and the site administrator set that rule to be system-wide.
 

MsAccessNL

Member
Local time
Today, 18:12
Joined
Aug 27, 2022
Messages
184
Code:
Sub PdfToText(sPdfName As String)
        'Convert PDF to text mbv xpdfTool has a number of translate  Options
        Dim sOption As String
        sOption = "-simple" '"-lineprinter" '"-layout"
        Shell "E:\Documenten\XpdfReader\xpdf-tools4.02\bin32\pdftotext.exe " & sOption & " " & sPdfName & " tempfile.txt"
End Sub
 

MsAccessNL

Member
Local time
Today, 18:12
Joined
Aug 27, 2022
Messages
184
If you have links in the code, that warning is normal. But it gets better after you reach a certain number of posts. I think the rationale is that if you haven't been kicked out past that threshold, we let you do a few more things. The moderators don't control that, by the way. The site owner and the site administrator set that rule to be system-wide.
Thnx DocMan, or Superman! There was (outcommented) url in the code. Really happy with your quick reply!
 

MsAccessNL

Member
Local time
Today, 18:12
Joined
Aug 27, 2022
Messages
184
If you have links in the code, that warning is normal. But it gets better after you reach a certain number of posts. I think the rationale is that if you haven't been kicked out past that threshold, we let you do a few more things. The moderators don't control that, by the way. The site owner and the site administrator set that rule to be system-wide.
I got a similar problem with stack overflow, i can't post anything any more. Do you have any tips?
 

MsAccessNL

Member
Local time
Today, 18:12
Joined
Aug 27, 2022
Messages
184
There are alse versions of Excel where you can read pdf with Power Query...
 

MarkK

bit cruncher
Local time
Today, 09:12
Joined
Mar 17, 2004
Messages
8,180
If you have multiple long running operations like this you can write a script, which runs asynchronously, and call it repeatedly with different parameters. Each call executes in its own thread, so you can start a new script running before the previous one terminates. In your case, you could do this in the loop, like...
Code:
Do While file <> ""
    Shell "wscript C:\PathTo\YourScript.vbs " & filePath & " " & file
Loop
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 11:12
Joined
Feb 28, 2001
Messages
27,162
I’m looking to start a project relating to the classification of legal documents. What are good tools to convert .doc and .pdf files to text?

Since legal documents sometimes have very particular spacing, ordering, and numbering, does anybody have tips for what I should or should not keep in my raw text data?

A couple of comments are in order.

1. Tacking a question on the tail end of another thread is less productive than starting a new thread. More people will see - and have a chance to respond to - a new thread. Not that you've done anything illegal. Just less efficient.

2. Word converts .DOC files to other formats including ANSI Text. However, I think you need an advanced copy of the Adobe software to convert a normal .PDF to something else. The "something-else-to-PDF" is built in to Word, Excel, and Access. (Never tried the other Office items but wouldn't bet against it being there, too.) The trick is that the Office built-in convertor code is one-way.

3. The tips for what to keep will be defined by what you want to do with the information.
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 12:12
Joined
Feb 19, 2002
Messages
43,257
What does having to convert from .pdf to .DocX have to do with anything? Are you also going to have to try to classify the documents using VBA?

Converting a .pdf to .txt will cause it to lose ALL formatting and any embedded images. Everything should be converted to .DocX unless you are not using Word as your standard.

I use Able2Convert. It has a batch feature. I use it mostly to convert bank statements to Excel but I have used it on occasion to convert a .pdf to Word. All I can say is garbage in, garbage out. My last conversion was a fiasco. I tried NINE conversion programs. the one I own plus 8 web based samples that were "free". The file was a phone list and it was so poorly formatted that no program could create a computer usable spreadsheet or doc file from it. Even though it looked like the columns lined up, the conversion tools didn't see it that way. So on some rows fields 1 and 2 were merged but separate on others or 6,7, and 8 were merged. It was completely unusable:(

So, the answer is, only well formatted documents ever convert cleanly.
 

Users who are viewing this thread

Top Bottom