OLE Query

gumbi17

Registered User.
Local time
Today, 14:16
Joined
Aug 23, 2011
Messages
43
I have been passed down an older database that I am having problems with. I am trying to query out the OLE objects that people attached to each record. They are all .pdf but when queried all I get is the tage "OLE OBJECT" instead of the pdf. What do I need to adjust to get it to export the .pdf instead?
 
Ok this is exactly what I need, but the coding for the ACCDB is that run through VBA or what other coding source?
 
No promises, but I'll see if I can convert that code. If you don't hear anything from me by Monday it means I failed.
 
I'd thought I give you an update on where I'm at. These are the rules I discovered for extracting a PDF from an OLE BLOB..

  1. Delete from beginning of the BLOB to the string %PDF
  2. Delete from the string EOF%% 0xOD 0xOA to the end of the BLOB

    Edit: This was being put in by the code I was using to write the BLOB to file
  3. Delete all occurrences of the sequences of 0x11 0x20 0x01 0x00 0x00 0x80 0x00 0x00 0x00 0x00 0x00 0x00 in the BLOB

I've found two places on line with some code (not VBA) with claims that it extracts a PDF. Neither of them implement rule 3. But if you leave them in, Acrobat Reader will tell you the resulting file is damaged and won't open it.

I can't imagine why Microsoft is embedding these sequence in the PDF unless it's just to make this process difficult. Or maybe Adobe is doing this. Edit: The code was putting theses in.

Tomorrow I'll try to implement these rules in VBA which I definitely going to be a learning experience for me. I'm finding that none of the string functions I normally count on will help me manipulate the BLOB chunks.
 
Last edited:
it is showing "OLE Object" in the field because, it was written in the field through VBA code, ie. AppendChunk.

to show it you need a vba code to extract this blob.
create a form with adobe reader control. then on some event read the extracted blob to display.
 
@arnelgp Thanks for highlighting the gumbi17's main problem; that the PDFs are showing up as OLE Objects. I got so distracted by the intricacies of the VBA to export the OLE Object that I overlooked that. I'm now wondering what that really means. I've found that if I export an entire PDF OLE BLOB using GetChunk and then import that file using AppendChunk the added BLOB shows up as "Adobe Acrobat Document" as one would expect. If I import a PDF file using Append Chunk I get "Long binary data". So this category of OLE Object has me concerned. If the OLE format for PDFs has changed whatever I come up with that works in Access 2013 may not be any help to gumbi17.
 
Here's the code I've come up with to Extract PDFs from OLE Objects. It's also to be found in the attached database. Before you use any of this, please make backups.

Code:
Public Function ExtractPDF(BLOBField As DAO.Field2, FilePath As String) As String

Dim b() As Byte
Dim i As Long
Dim j As Long
Dim k As Long
Dim m As Long
Dim PDFStartMarker(4) As Byte
Dim PDFEndMarker(8) As Byte
Dim PDFEndMarkerUnix(7) As Byte
Dim PDFStartBytePosition As Long
Dim PDFEndBytePosition As Long
Dim FileNumber As Long
Dim OutputArray() As Byte
Dim FileSize As Long
ReDim b(1)

'PDFStartMarker (%PDF) defines the beginning of the PDF
PDFStartMarker(0) = 37
PDFStartMarker(1) = 80
PDFStartMarker(2) = 68
PDFStartMarker(3) = 70
'PDFENDMarker (%%EOF) defines the end of the PDF Windows
PDFEndMarker(0) = Val("&H25")
PDFEndMarker(1) = Val("&H25")
PDFEndMarker(2) = Val("&H45")
PDFEndMarker(3) = Val("&H4F")
PDFEndMarker(4) = Val("&H46")
PDFEndMarker(5) = Val("&H0D")
PDFEndMarker(6) = Val("&H0A")
PDFEndMarker(7) = Val("&H00")

'PDFENDMarker (%%EOF) defines the end of the PDF Unix
PDFEndMarkerUnix(0) = Val("&H25")
PDFEndMarkerUnix(1) = Val("&H25")
PDFEndMarkerUnix(2) = Val("&H45")
PDFEndMarkerUnix(3) = Val("&H4F")
PDFEndMarkerUnix(4) = Val("&H46")
PDFEndMarkerUnix(5) = Val("&H0A")
PDFEndMarkerUnix(6) = Val("&H00")


j = 0
k = 0
m = 0
'Scan throught the BLOB to find the beginning and end of the PDF
For i = 0 To BLOBField.FieldSize - 1
    'Finding beginning
    b = BLOBField.GetChunk(i, 1)
    If b(0) = PDFStartMarker(j) Then
        If j = 3 Then
            PDFStartBytePosition = i - j
            j = 0
        Else
            j = j + 1
        End If
    Else
        j = 0
    End If
    'Finding end UNIX
    If b(0) = PDFEndMarkerUnix(m) Then
        If m = 6 Then
            PDFEndBytePosition = i
            Exit For
            m = 0
        Else
            m = m + 1
        End If
    Else
        m = 0
    End If
    'Finding end windows
    If b(0) = PDFEndMarker(k) Then
        If k = 7 Then
            PDFEndBytePosition = i
            Exit For
            k = 0
        Else
            k = k + 1
        End If
    Else
        k = 0
    End If
Next i
If PDFStartBytePosition = 0 Then
    ExtractPDF = "Start position not found"
    Exit Function
End If
If PDFEndBytePosition = 0 Then
    ExtractPDF = "End position not found"
    Exit Function
End If
FileSize = PDFEndBytePosition - PDFStartBytePosition + 1
ReDim OutputArray(FileSize)
OutputArray = BLOBField.GetChunk(PDFStartBytePosition, FileSize)

' Remove any existing destination file.
FileNumber = FreeFile
Open FilePath For Output As FileNumber
Close FileNumber

' Open the destination file and output the PDF
Open FilePath For Binary As FileNumber
Put FileNumber, , OutputArray
Close FileNumber

End Function


Below is an example of how you would use the function. The SQL you put in this code should get the field of the OLE Object, here named DocumentPDF, and a name from the table, here named DocumentName. You just need something unique from the table to name the resulting PDF file. If nothing else if you have an autonumber, let's say ID, you could name the output PDFs "File" & rs!ID.

Code:
Sub Extract()

Dim strSQL As String
Dim FolderPath As String
Dim rs As DAO.Recordset
Dim BLOBField As DAO.Field2
FolderPath = Application.CurrentProject.Path & "\"
strSQL = "SELECT DocumentName, DocumentPDF FROM [Document]"
Set rs = CurrentDb.OpenRecordset(strSQL)
Do While Not rs.EOF
    Set BLOBField = rs!DocumentPDF
    ExtractPDF BLOBField, FolderPath & rs!DocumentName & ".PDF"
    rs.MoveNext
    Set BLOBField = Nothing
Loop

rs.Close


End Sub

In the Export PDF OLE Object module you will also find a subroutine named ExtractBLOB which you might find useful if the code to extract the PFDs doesn't work with your data. This subroutine writes the entire BLOB to the file which you could then examine with a Hex editor. If you don't have one I recommed the Hxd editor which is freeware you can get at http://download.cnet.com/HxD-Hex-Editor/3000-2352_4-10891068.html

The code is slow with large PDFs. I will work on increasing its performance, and if I can, I will post a better version later. If you have any problems with this let me know.
 

Attachments

Sorry I was out of pocket over the weekend! Thanks for the coding! The field that has the ole object is set from a form that someone just right clicks into to insert object. Would there be a better way to change it for the future without effecting everything already in or should I get all of the documents out and then change it to a different type?
 
Just my opinion but if I got them out I wouldn't put them back in. I would keep them in files with just the paths to them in the database and view them through a control. I don't know much about what kind of control would be best, but if you want I'll look into it.

Have you tried the code yet? I'll really like to know if it works or not. I'm concerned about your documents not showing Adobe Acrobat Document. If this doesn't work I'd like to continue working with you find a solution if there is one.

I'm working on a better version of this and hope to post it tomorrow.
 
Below is the revised function to export PDFs from BLOBs. This one is about ten times fast than the previous version. Also further testing revealed that my previous version wasn't finding the ends of PDFs that were created by Access. That's been corrected in this version.

I didn't mention this before, but this function returns an empty string if it didn't have any problems and returns "Start position not found" or "End position not found" if it did find them. So you might want to test for an empty string in the code that uses this.

Code:
Public Function ExtractPDF(BLOBField As DAO.Field2, FilePath As String) As String

Dim i As Long
Dim j As Long
Dim k As Long
Dim m As Long
Dim n As Long
Dim PDFStartMarker(4) As Byte
Dim PDFEndMarker(8) As Byte
Dim PDFEndMarkerUnix(7) As Byte
Dim PDFEndMarkerNEOL(6) As Byte
Dim PDFStartBytePosition As Long
Dim PDFEndBytePosition As Long
Dim FileNumber As Long
Dim OutputArray() As Byte


'PDFStartMarker (%PDF) defines the beginning of the PDF
PDFStartMarker(0) = 37
PDFStartMarker(1) = 80
PDFStartMarker(2) = 68
PDFStartMarker(3) = 70
'PDFENDMarker (%%EOF CR LF) defines the end of the PDF Windows
PDFEndMarker(0) = Val("&H25")
PDFEndMarker(1) = Val("&H25")
PDFEndMarker(2) = Val("&H45")
PDFEndMarker(3) = Val("&H4F")
PDFEndMarker(4) = Val("&H46")
PDFEndMarker(5) = Val("&H0D")
PDFEndMarker(6) = Val("&H0A")
PDFEndMarker(7) = Val("&H00")

'PDFENDMarkerUnix (%%EOF LF) defines the end of the PDF Unix
PDFEndMarkerUnix(0) = Val("&H25")
PDFEndMarkerUnix(1) = Val("&H25")
PDFEndMarkerUnix(2) = Val("&H45")
PDFEndMarkerUnix(3) = Val("&H4F")
PDFEndMarkerUnix(4) = Val("&H46")
PDFEndMarkerUnix(5) = Val("&H0A")
PDFEndMarkerUnix(6) = Val("&H00")

'PDFENDMarkerNEOL (%%EOF) defines the end of the PDF with no end of line
PDFEndMarkerNEOL(0) = Val("&H25")
PDFEndMarkerNEOL(1) = Val("&H25")
PDFEndMarkerNEOL(2) = Val("&H45")
PDFEndMarkerNEOL(3) = Val("&H4F")
PDFEndMarkerNEOL(4) = Val("&H46")
PDFEndMarkerNEOL(5) = Val("&H00")

ReDim OutputArray(BLOBField.FieldSize)
j = 0
k = 0
m = 0
n = 0

OutputArray = BLOBField.GetChunk(0, BLOBField.FieldSize)
'Scan throught the BLOB to find the beginning and end of the PDF
For i = 0 To BLOBField.FieldSize - 1
    'Finding beginning looking for %PDF
    If OutputArray(i) = PDFStartMarker(j) Then
        If j = 3 Then
            PDFStartBytePosition = i - j
            j = 0
        Else
            j = j + 1
        End If
    Else
        j = 0
    End If
    'Finding end with no end of line  %%EOF
    If OutputArray(i) = PDFEndMarkerNEOL(n) Then
        If n = 5 Then
            PDFEndBytePosition = i
            Exit For
            n = 0
        Else
            n = n + 1
        End If
    Else
        n = 0
    End If
    'Finding end UNIX  %%EOF Lf
    If OutputArray(i) = PDFEndMarkerUnix(m) Then
        If m = 6 Then
            PDFEndBytePosition = i
            Exit For
            m = 0
        Else
            m = m + 1
        End If
    Else
        m = 0
    End If
    'Finding end windows  %%EOF Cr Lf
    If OutputArray(i) = PDFEndMarker(k) Then
        If k = 7 Then
            PDFEndBytePosition = i
            Exit For
            k = 0
        Else
            k = k + 1
        End If
    Else
        k = 0
    End If
Next i

If PDFStartBytePosition = 0 Then
    ExtractPDF = "Start position not found"
    Exit Function
End If
If PDFEndBytePosition = 0 Then
    ExtractPDF = "End position not found"
    Exit Function
End If
'Move the PDF to the front of the array overwriting the OLE header
For i = 0 To PDFEndBytePosition - PDFStartBytePosition
    OutputArray(i) = OutputArray(i + PDFStartBytePosition)
Next i
'Chop off OLE end segement
ReDim Preserve OutputArray(PDFEndBytePosition - PDFStartBytePosition - 1)

' Remove any existing destination file.
FileNumber = FreeFile
Open FilePath For Output As FileNumber
Close FileNumber

' Open the destination file and output the PDF
Open FilePath For Binary As FileNumber
Put FileNumber, , OutputArray
Close FileNumber
Erase OutputArray

End Function
 

Attachments

I'm still trying to get it to work for me! I keep getting different errors each time I put this into the VBA code!
 
I could set up some code for you if you provide the following.

  1. Screen shot of the table with the PDFs in design view. Mainly I need the field names, types and the name of the table.
  2. The path of a folder where you want the output PDFs to go
  3. A description of how you want the PDF files named, e.g., some field name of combination or field name from the table.
 
Sorry my boys got sick and I had to get them back to healthy. Here is a screen grab of what the table looks like in design view. Capture.PNG

The folder location is: Y:\SouthCross\pdfs
 
Sorry about the naming if you could make it the same as lease num1
 
Is the name of that table Lease Basic? If not what is its name. Also do you want the PDF file names to be the the LEASE NUM1 values, like 00101-0001-00001A.PDF?
 
The table name is Lease Basic, and yes 00101-00001-00001A.pdf would be perfect!
 
The revised database for extracting PDFs is attached. Before you use it with your system I ask you to test it with the one PDF that is included in the Lease Basic table that I included for testing, Open the form ExportPDFtForm and output the Lease Number 00101-00001-00001A by selecting it in the Lease Number combo box and clicking on Output Selected Lease PDF or just click on Output All. You should find an 00101-00001-00001A.PDF file in the Y:\SouthCross\pdfs folder. If you open it you should find an IRS Form 1040. If this works then you can delete the Lease Basic table and of course the 00101-00001-00001A.PDF file and proceed to using it with your system. If it doesn't work let me know how it's failing. It should work unless there's some significant difference between our systems.

There are two ways to use this with your system, You can link to the Lease Table from this database and run the program from this database or import the form from this database into yours and run it there.

To Link to the Table

  1. Delete or rename the local Lease Basic table in this database. Otherwise the link will get the wrong name.
  2. On the EXTERNAL DATA tab, Import and Link section, double click on Access
  3. Below select "Link to the data source by creating a linked table"
  4. Click on Browse and select your database with the Lease Basic table
  5. Click Ok and Go to the Tables tab and select Lease Basic, Click Ok
To Import the Form

  1. Open your database
  2. On the EXTERNAL DATA tab, Import and Link section, double click on Access
  3. Below select "Import tables, queries, forms macros ...."
  4. Click on Browse and select this database Export OLE PDF.accdb
  5. Click Ok and Go to the Forms tab and select ExportPDFtForm Click Ok

To use this open the form ExportPDFtForm. The Lease Number combo box allows you to select single records for output. I suggest you do this first to see if this is going to work with your OLE Objects. If it doesn't work then I ask you to select one of the records and click the Output Selected OLE Object. This will output the entire OLE Object to the Y:\SouthCross\pdfs folder. It will have the Lease Number for a file name without any extension. Please put this in a zip file and upload it so that I can try to see if anything more can be done. If it works then you can process all of the records by clicking Output All. If the program can't find the beginning or end of the PDF within the OLE Object you will get a msgbox each time this occurs. If you don't want to see these you can click on Ignore Errors before clicking Output All.

I didn't provide a progression bar. I suggest you watch the Y:\SouthCross\pdfs folder to see how things are progressing.

Good luck
 

Attachments

Users who are viewing this thread

Back
Top Bottom