with the help from this site and others, i have cobbled together this code and have had it work. however, this site i wish to extract data from gives me an error 5.
Where it fails is on this line; tmpAt = InStr(tableStart, rawHtml,"</table")
Debug.Print tmpAt before that line will be 1995; but when it fails, that line is 509
After that I have
Debug.Print tableStart; it will be between 2148 and 2154; but when it fails, that line is 0
Is it fail because there is no html code to count? What is more frustrating is that it does not fail on the same record; this time it failed after 78 records, earlier today it was 80 or so; there has been times it has failed before 50.
i am venturing into something new with scraping from the web, and i am stuck with this one:banghead: thank you for your help
Code:
Private Sub Command56_Click()
'On Error GoTo ErrorHandler
Dim rstSsn As DAO.Recordset
Set rstSsn = CurrentDb.OpenRecordset("SELECT bbAwards.ssnInd, bbAwards.smName FROM bbAwards;")
DoCmd.SetWarnings False
'''''''''''' clear table
DoCmd.RunSQL ("DELETE bbAwardsFed.* FROM bbAwardsFed;")
'''''' start time ''''''
Me!txtStatus.Value = "Start time: " & Format(Time(), "Hh:Nn") & vbCrLf
Me.Repaint
If rstSsn.RecordCount > 0 Then
rstSsn.MoveFirst
Do Until rstSsn.EOF
Debug.Print rstSsn!ssnInd
rawHtml = GetPage("https://aWebSite.asp?SSN=" & rstSsn!ssnInd & " ")
' Search forward until we're just before the table we want
tmpAt = InStr(1, rawHtml, "<h3 align")
tmpAt = InStr(tmpAt, rawHtml, "<h3 align")
Debug.Print tmpAt
' Get the index of the start of the opening <table> tag
tableStart = InStr(tmpAt, rawHtml, "<table")
Debug.Print tableStart
' Get the index of the end of the closing </table> tag
tmpAt = InStr(tableStart, rawHtml, "</table")
Debug.Print tmpAt
tableEnd = InStr(tmpAt, rawHtml, ">")
Debug.Print tableEnd
' Extract the table
tableChunk = Mid(rawHtml, tableStart, tableEnd - tableStart + 1)
Debug.Print tableChunk
' Use native VBA file I/O
tempFile = Application.CurrentProject.Path & "\tempTable.html"
Open tempFile For Output As #1
Write #1, tableChunk
Close #1
''''''''''''' Import the file to a table
' http://www.blueclaw-db.com/transfertext-docmd.htm
txtImportName = "ImportbbAwardsFed" 'MSysIMEXColumns is a hidden system table
'file>options, current database>navigation options, put check in "show system objects"
DoCmd.TransferText acImportHTML, txtImportName, "bbAwardsFed", tempFile, False
Me!txtStatus.Value = Me!txtStatus.Value & "Getting the award data for " & rstSsn!smName & "..." & vbCrLf
Me.Repaint
rstSsn.MoveNext
' Delete the temp file
'Kill tempFile
Loop
End If
rstSsn.Close
Set rstSsn = Nothing
Where it fails is on this line; tmpAt = InStr(tableStart, rawHtml,"</table")
Debug.Print tmpAt before that line will be 1995; but when it fails, that line is 509
After that I have
Debug.Print tableStart; it will be between 2148 and 2154; but when it fails, that line is 0
Is it fail because there is no html code to count? What is more frustrating is that it does not fail on the same record; this time it failed after 78 records, earlier today it was 80 or so; there has been times it has failed before 50.
i am venturing into something new with scraping from the web, and i am stuck with this one:banghead: thank you for your help