Loop through recordset to download PDF from URL in field then save as image in folder (1 Viewer)

ksimpson61

Registered User.
Local time
Today, 16:10
Joined
Sep 8, 2016
Messages
36
I have a table that has a text field for a URL. The URL is a pdf and I would like to save this pdf as an image (jpg or gif, doesn't matter). I have about 180 records in the table so I need to set up a loop that will go through each record, open the url, convert to image and save the image to a folder. I could do this manually, but the pdfs are updated periodically, so I want to be able to just click a button and go through this loop before I run a report to ensure I have the most up-to-date image.

I have found code to loop through a recordset, code to open a pdf and save to a different type, and code to follow a hyperlink, but I can't seem to put this all together myself. Any help would be greatly appreciated.
 

sneuberg

AWF VIP
Local time
Today, 14:10
Joined
Oct 17, 2014
Messages
3,506
I have found code to loop through a recordset, code to open a pdf and save to a different type, and code to follow a hyperlink, but I can't seem to put this all together myself. Any help would be greatly appreciated.

Could you post the code you found that converts a PDF to an image?
 

ksimpson61

Registered User.
Local time
Today, 16:10
Joined
Sep 8, 2016
Messages
36
here is the link to the code:


'http://www.myengineeringworld.net/2013/03/vba-macro-to-convert-pdf-files-into.html

It is assuming a local path to open the pdf and then convert it to various file types.
 

sneuberg

AWF VIP
Local time
Today, 14:10
Joined
Oct 17, 2014
Messages
3,506
The code for converting PDF files requires that you have Adobe Professional? Do you have that? I believe it costs about $15 per month. Ouch! The other thing is that the code uses the Excel WorksheetFunction.Substitute function. You could replace that with the Access Replace function but you will probably want to modify the code to pass in the new path anyway.

The problem I'm having in coming up with a solution for you is the URLs? Are these URLs like "https://www.irs.gov/pub/irs-pdf/f1040.pdf"? If that the case then a step in this process will be to download those PDFs. I don't know how to do that. If these documents are on your PC then maybe you can easily convert the URL to a path.

Where are these PDF documents located?
 

ksimpson61

Registered User.
Local time
Today, 16:10
Joined
Sep 8, 2016
Messages
36
Yes, I do have Adobe acrobat, not just reader.

The URLs are webpages, unfortunately, not a path to a local drive.
 

sneuberg

AWF VIP
Local time
Today, 14:10
Joined
Oct 17, 2014
Messages
3,506
I might have found something here. I'll let you know if I can get it working.
 

ksimpson61

Registered User.
Local time
Today, 16:10
Joined
Sep 8, 2016
Messages
36
Thanks! Let me know if you do because it's beyond my knowledge base (which is pretty basic).
 

sneuberg

AWF VIP
Local time
Today, 14:10
Joined
Oct 17, 2014
Messages
3,506
The attached database has a table with some IRS Form Descriptions and URLs for their PDFs. The form has a button Convert PDFs. When you click this button the PDFs should be downloaded to your application directory and then converted to JPEGs. You should end up with six new files when you run this. You may want to run this application in a folder to make it easier to clean up. The files are given the names in the description field in the table.

As I don't have Adobe Professional I can't test the conversion to JPEG on my system. Please test this on yours and let me know if it works. If we can get that much working we can make improvements on it like adding the error checking, location for the output, etc. You will need to decide how you want these files named. Is there something in your table that could be used for a name? Do you want to parse out some part of the URL for the name. If so what part?

The code to convert to JPEGs was adapted from the link you provided. Please note the instruction in this about adding the reference for Adobe. Those were:

In order to use the macro you must enable the Acrobat library from VBA editor:
'Go to Tools -> References -> Adobe Acrobat xx.0 Type Library, where xx depends
'on your Acrobat Professional version (i.e. 9.0 or 10.0) you have installed to your PC.

'Alternatively you can find it Tools -> References -> Browse and check for the path
'C:\Program Files\Adobe\Acrobat xx.0\Acrobat\acrobat.tlb
'where xx is your Acrobat version (i.e. 9.0 or 10.0 etc.).
 

Attachments

  • LoopThruPDFS.accdb
    416 KB · Views: 151
Last edited:

ksimpson61

Registered User.
Local time
Today, 16:10
Joined
Sep 8, 2016
Messages
36
Almost there....

So I changed the file path and was able to download all three of the pdfs. I also added some code to the module to convert PDF Code so that it could find the downloaded file. Then the process converted the first document. I tried to add the Do While not rs.EOF , rs.MoveNext, Loop to the Convert PDF Code, but it isn't working quite right. It kept looping on the first PDF and converting it. I have reattached the DB with the changes I made and the jpgs it created. The pdf's were over the limit to attach.
 

Attachments

  • LoopThruPDFS.accdb
    444 KB · Views: 103
  • US Individual Income Tax Return_Page_1.jpg
    US Individual Income Tax Return_Page_1.jpg
    100.9 KB · Views: 114
  • US Individual Income Tax Return_Page_2.jpg
    US Individual Income Tax Return_Page_2.jpg
    101.8 KB · Views: 119

sneuberg

AWF VIP
Local time
Today, 14:10
Joined
Oct 17, 2014
Messages
3,506
Are you saying that when you ran this before your modifications the SavePDFAsJPEG wasn't seeing the downloaded PDF and nothing was being convert to JPEGs? In any case I don't think there should be any need to have two loops to do this.
 

ksimpson61

Registered User.
Local time
Today, 16:10
Joined
Sep 8, 2016
Messages
36
So, when I first ran it "out of the box" I received an error from Adobe Acrobat, "There was an error opening this document. This file cannot be found." And an error from Microsoft Visual Basic "Run-time error '91': Object variable or With block variable not set.

I made my changes in step-wise fashion. I first changed the strFolder definition from Application.Currentproject.path to a folder on my network drive. I ran the process again which resulted in all three of the pdfs being downloaded and saved in that folder, but the conversion did not take place. This was because the folder was not identified in the conversion code. So I tried identifying the PDFPath (from the Convert PDF Code) as the folder with the strFileName, which I had to identify also, which led me to having to identify where that name was coming from, which is the table in the database. Once this was all done, I tried running it again. That was when the three pdfs were downloaded and only the first pdf was converted to the two pages. So I thought it must need to loop through again to convert all the pdfs. That was when I tried to put the loop in the Convert PDF Code. But that isn't working as it just continues to convert the first file.
 

sneuberg

AWF VIP
Local time
Today, 14:10
Joined
Oct 17, 2014
Messages
3,506
Ok with that info I should be able to fix it but probably not until tomorrow.
 

sneuberg

AWF VIP
Local time
Today, 14:10
Joined
Oct 17, 2014
Messages
3,506
I see that I wasn't passing the full downloaded file path to the SavePDFAsJPEG. I've fixed that in the attached version. Please test this one and let me know what it does.
 

Attachments

  • LoopThruPDFS_V2.accdb
    412 KB · Views: 205

ksimpson61

Registered User.
Local time
Today, 16:10
Joined
Sep 8, 2016
Messages
36
I didn't bring my computer home from work so I'll have to wait until the morning to give it a try. Thank you so much!!!
 

ksimpson61

Registered User.
Local time
Today, 16:10
Joined
Sep 8, 2016
Messages
36
Wow!! that works perfectly!!! Thank you!!!! Now I just have to figure out how to move it to my database. :)
 

sneuberg

AWF VIP
Local time
Today, 14:10
Joined
Oct 17, 2014
Messages
3,506
Wow!! that works perfectly!!! Thank you!!!! Now I just have to figure out how to move it to my database. :)
Let me know if you need help. If you are going to delete the downloaded PDFs after they are converted the kill function is probably the easiest way to do that.
 
Last edited:

ksimpson61

Registered User.
Local time
Today, 16:10
Joined
Sep 8, 2016
Messages
36
I have a few glitches to fix...maybe you could help. How do I get past a record that might not have a url or the url is invalid. The table I am using consists of all of our properties, whether we still own them or not. The url that I am accessing is a current site plan that we have on our website. If we don't own the property, there either isn't an existing webpage (it was taken down after I populating the table) or its older than when I created the table and the field is just null.
 

sneuberg

AWF VIP
Local time
Today, 14:10
Joined
Oct 17, 2014
Messages
3,506
What do you want to do for these conditions? Just skip them or display a message box or put a jpeg in the folder with the name you've chosen with an error message? How are you going to name these files?
 

ksimpson61

Registered User.
Local time
Today, 16:10
Joined
Sep 8, 2016
Messages
36
If the field is null, I just want to skip it. If the there is an error in the URL, then maybe a log or message at the end showing those records using a different field in the table as the identifier, i.e. Property. Does that make sense?
 

sneuberg

AWF VIP
Local time
Today, 14:10
Joined
Oct 17, 2014
Messages
3,506
If the field is null, I just want to skip it. If the there is an error in the URL, then maybe a log or message at the end showing those records using a different field in the table as the identifier, i.e. Property. Does that make sense?
I don't understand exactly what you want to do about the errors. In the attached database I put in a example of what you could do. I added a field to the table named URLError which I set if the DownloadURL function returns false (see code below). If it returns true the file is converted. I added a bogus URL to the table to test this and found that it takes a long time (maybe a minute) to return if the URL is wrong. I'll look for a fix for this but you will probably need to decide on a timeout value. As far as URLs that are Null the code now opens a query (Query1) which filters out the URLs that are Null.


An rs.Close was also added to the code.


Code:
Private Sub ConvertPDF_Click()
Dim strFolder As String
Dim strFileName As String
Dim rs As DAO.Recordset
strFolder = Application.CurrentProject.Path

Set rs = CurrentDb.OpenRecordset("[COLOR="blue"]Query1[/COLOR]")
Do While Not rs.EOF
    strFileName = RemoveIllegalFileCharacters(rs!Description) & ".PDF"
    DownloadURL rs!URL, strFolder & "\" & strFileName
    If [COLOR="teal"]DownloadUR[/COLOR]L(rs!URL, strFolder & "\" & strFileName) Then
        SavePDFAsJPEG strFolder & "\" & strFileName
    Else
   [COLOR="purple"]     rs.Edit
        rs!URLError = True
        rs.Update[/COLOR]
    End If
    rs.MoveNext
Loop
[COLOR="darkgreen"]rs.Close[/COLOR]


End Sub
 

Attachments

  • LoopThruPDFS_V3.accdb
    468 KB · Views: 109

Users who are viewing this thread

Top Bottom