Loop through recordset to download PDF from URL in field then save as image in folder

Of course! A query. Duh. Okay, so I got that to work, but if the URL is wrong, I get an error that there isn't a PDF to open because there wasn't one created, not that there was a URL error.

I have a attached a document with screenshots of the error...

I am correcting the URLs in the meantime and restarting the process to see if I can get everything corrected. As long as the URLs don't change in the future, it shouldn't be a problem.
 

Attachments

Oops I see I forgot to take out the first DownLoadURL, just delete the line in red

Code:
Private Sub ConvertPDF_Click()
Dim strFolder As String
Dim strFileName As String
Dim rs As DAO.Recordset
strFolder = Application.CurrentProject.Path

Set rs = CurrentDb.OpenRecordset("Query1")
Do While Not rs.EOF
    strFileName = RemoveIllegalFileCharacters(rs!Description) & ".PDF"
    [COLOR="Red"]DownloadURL rs!URL, strFolder & "\" & strFileName[/COLOR]
    If DownloadURL(rs!URL, strFolder & "\" & strFileName) Then
        SavePDFAsJPEG strFolder & "\" & strFileName
    Else
        rs.Edit
        rs!URLError = True
        rs.Update
    End If
    rs.MoveNext
Loop
rs.Close


End Sub

You could put an extra check in to see if the file was actually downloaded, something like

If Dir( strFolder & "\" & strFileName) <> vbNullString Then
'The file is there so do the conversion


Not having Adobe is making this a longer process than it should be. Reminds me of the early eighties when in college we had to submit our computer programs on punch cards and then wait for a print out with the errors. :)
 
I am running through it again with the extra line out.

I remember taking a computer class in high school in the late-mid 70's. Using Fortran, creating simple mathematical equations on punch cards and waiting for the results. Funny how that is one of the classes that sticks in my memory more than any others.
 
So, I was able to insert code into the Sub SavePDFas JPEG in which it checks to the size of the pdf file. Since the file was being created, it just couldn't open when the webpage was wrong.

'Check size of the input file.
If FileLen(PDFPath) > 0 And FileLen(PDFPath) < 1000 Then

Exit Sub
End If

This would then skip over the file and go to the next one. Then I was able to go into the folder, sorted by size, and find all of the pdfs that didn't work. I changed the paths and re-ran it. It worked perfectly this time. Yeah!!!

Thank you so much for your help. I couldn't have done it without it.
 
Don't we still have a problem with the time DownLoadURL takes before it determines a URL is bogus or you did find a fix for that. I haven't gotten around to even look for a solution yet.
 
I don't know why, but if the URL was wrong, it was not the full URL, just a couple of reference numbers. For example, the following is an actual URL. http://rpai.propertycapsule.com/property/output/document/view/id:4378/?time=1474647894/
When the process didn't work, there was an error in the id and time, but the code still created a file with a pdf extension, but it wasn't recognized as a pdf when it was activated to open. So the code that you provided to check for an error in the URL didn't apply.

I don't know if I am explaining this very well. The process was getting past the URL check, the check to see if it's a pdf file (it had the right extension), but it would stop at this step:

'Set the JS Object - Java Script Object.
Set objJSO = objAcroPDDoc.GetJSObject
 
I think we are talking about two different things. As I don't have Adobe I don't know anything about the error you are getting with http://rpai.propertycapsule.com/property/output/document/view/id:4378/?time=1474647894/ because as you said it downloads ok.

What I'm taking about is a URL like the URL https://www.irs.gobierno/pub/irs-pdf/f1040.pdf which is bogus as I changed gov to gobierno in it to test the code with bogus URLs. It is the 5th record in the database I uploaded and when it hits the DownLoadURL function it takes a long pause (22 seconds) to think about it. I get a busy hour and a message saying Not Responding. Aren't you getting this?

I just time it and 22 seconds is how long it takes for my browser to report that the URL can't be found too.
 
Got it. You're right. They are two different issues. Since none of my URL's were actually bogus, I didn't run into that problem, so I couldn't test it.

When I run your database it takes less than a minute to process. I just looked at the time stamps on the files, and they were all 2:04.
 
If you don't expect a lot of bogus URLs maybe this doesn't need any further attention.
 
I'll leave the code in. You never know, it could happen, but I'm not going to worry about how long it takes if it happens so seldom. Thanks again for all your help!!!
 

Users who are viewing this thread

Back
Top Bottom