Internet Explorer "scrape"

option

Registered User.
Local time
Today, 18:09
Joined
Jul 3, 2008
Messages
143
Hey guys,

I've been searching all over for a sample IE scrape and I came across one that works...kind of. This one pulls all of the pages HTML into a variable, which is great, but not exactly what I need. I need to be able to pull what information is selected in any combo boxes/typed in text boxes and if possible, record any command button clicks. Is this even a possibility? My only thought is to control the web browser direct in Access, pre-loading the page needed. The only problem is that I still won't be able to capture the data I need. Any ideas?:confused:
 
I am not exactly sure I understand you questions but I assume that you want to read and write to an internet page or at least to a HTML document.

This is possible and there are many many examples of this. First you need to choose whether you want to put the Webbrowser control on a form of MSAccess or whether you want to open a webbrowser and have two different windows. I personally like putting the webbrowser control on a form.

In the FormLoad you put something like this

Code:
Me.MYBrowser1.Navigate "www.google.com"
Once you have this what you want to do is learn about the Document Object Model or DOM.

Make a reference to Microsoft HTML object Library in Access and have a look at what your object browser then has available.

You are going to have to learn about HTML and HTML tags.

In my form variables I declare a HTMLDocument variable

Code:
Dim myHTMLDoc As MSHTML.HTMLDocument
Think of a webpage as a document so in the DocumentComplete routine of the webbrowser you can put something like this. You have to put it there because this is when your document is complete - also note that sometime this event fires more than once eg when the webpage has frames.

Code:
Set myHTMLDoc = Me.WebBrowser4.Document
Once you have the HTML object you then have to follow the DOM to get what you want. This could be done in both directions ie you could read something or you could add something, click on something, or execute something.
 
So far, that's getting me on the right track. I've got a sample db that has the web page loaded on a form, but my main hurdle is capturing what is entered in the webpage. For example, if my users were performing searches on Google, I'd want to be able to capture what they have typed in the text box on the google page and also capture when they click the "Google Search" or "I'm Feeling Lucky" buttons.
 
So far, that's getting me on the right track.
That's great.

I feel you are going to have to learn by yourself a little more as I am not an expert and it is also good to learn ;-)

for example set up a button on your form after you have a page loaded with this

Code:
Dim myItem as object

Set myHTMLDoc = Me.WebBrowser4.Document.documentElement.All
For Each myItem In myHTMLDoc
debug.print myItem.tagName
debug.print myItem.Name
debug.print myItem.Value
Next Item
Note depending on the page and the myItem they might give an error.
You can follow along what is being output by right clicking on the webpage and viewing the source.


but my main hurdle is capturing what is entered in the webpage.
If by this you mean what the webpage itself does then the above will help you here.

If instead you mean that a user enters stuff and you want to capture what they enter that is a little harder. It depends on the internet site. If you have a look in the webbrowser control beforeupdate2 you will notice the URL is there. You can usually find in the URL the information that a user has entered especially if it is a "POST".

For example, if my users were performing searches on Google, I'd want to be able to capture what they have typed in the text box on the google page and also capture when they click the "Google Search" or "I'm Feeling Lucky" buttons.
Have a look around I remember when I was leaning this that google and yahoo was used as examples. I am sorry I have never wanted to get the results of a google search.

Edit: by the way this gets you the full htmldocument with the DOM
Code:
debug.Print Me.WebBrowser4.Document.documentElement.OuterHTML
 
Last edited:
Using all of the great information you have provided (thank you, btw!) I managed to get the names of the text box and both command buttons on the google page. Now, using the code below, I am able to capture what is in the text box:
Code:
Private Sub Command2_Click()
Dim myHTMLDoc As MSHTML.HTMLDocument
On Error GoTo Err_Command2_Click

Set myHTMLDoc = Me.WebBrowser0.Document.documentElement.all
[B] Debug.Print myHTMLDoc("q").Value[/B]

Exit_Command2_Click:
    Exit Sub

Err_Command2_Click:
    MsgBox Err.Description
    Resume Exit_Command2_Click
End Sub

Now, when I try to set a variable equal to the value of "q" , I get errors (object of with object variable not set). Or, if I try to set a value in a recordset equal to myHTMLDoc("q").Value, no matter what datatype i set the field in the recordset to, I get type mismatch error. I apologize for all of the questions, but this is my first time interacting with the web via access.
 
I am really happy to help as I am leaning this stuff as well. You are progessing pretty fast.

I am not sure if I follow you with this.

Maybe the element in the DOM you are calling does not have a "value"?

That is why I said that in what I gave you, you might get errors.
 
a few thoughts

Debug.Print myHTMLDoc("q").Value

with xml i am not sure you get a .valuemore likely you get a .txt, or a .xml

secondly what is "q" - i think this would imply that "q" is an object of the domdocument model - which i am sure it isnt

finally i am not sure, but i think most web sites that expect you to colelct/scrape data have a mechanism for retrieving such data in a standard form - eg googlemaps return kml/xml or csv files, depending on how you call the website
 
a few thoughts

secondly what is "q" - i think this would imply that "q" is an object of the domdocument model - which i am sure it isnt

"q" is actually the name of the text box on the Google page. Using this loop from above :
Code:
For Each myItem In myHTMLDoc
Debug.Print myItem.tagName
Debug.Print myItem.Name
Debug.Print myItem.Value
Next myItem
the values returned (in order) are :INPUT, q, and whatever is in the textbox. With a little re-work (starting from scratch! :() I was able to come up with the following for the forms OnLoad event and a command button:
Code:
Option Compare Database
Option Explicit


Private Sub Command1_Click()
Dim myHTMLDoc As MSHTML.HTMLDocument
Dim testIt As String
Dim db As DAO.Database
Dim rs As DAO.Recordset

Set db = CurrentDb()
Set rs = db.OpenRecordset("tblA")

Set myHTMLDoc = Me.WebBrowser0.Document
Debug.Print myHTMLDoc.documentElement.all("q").Value

testIt = Trim(myHTMLDoc.documentElement.all("q").Value)

With rs
.AddNew
rs![Fa] = testIt
.Update
.Close
End With

End Sub

Private Sub Form_Load()
Dim myHTMLDoc As MSHTML.HTMLDocument

Set myHTMLDoc = Me.WebBrowser0.Document

Me.WebBrowser0.Navigate "www.google.com"
End Sub

Ignoring all of the generic names, I am now able to load Google, type into its text box, click my (not the ones on Googles page) command button, and enter that value into a table as text. With this concept, I believe I can achieve what I had hoped for. All that I need to do is develop a way to replace my command buttons event with the command button on Googles page. I'm not quite sure how possible that is, but I've got time to try it. If it can't be done, then I'll have to adjust the users process a bit and have them click my command button on the for prior to clicking the one on Google. You guys are great! Thanks for getting me this far, it's VERY much appreciated!!
 
All that I need to do is develop a way to replace my command buttons event with the command button on Googles page.

By this do you mean that from access you want to click the button on the webpage?

If yes then in Theory all you need to do is use the "click" event.

Code:
myHTMLDoc.documentElement.all
Should give you all the elements of the HTML page. One of those elements is going to be the button. So I think what you need is to loop through these elements until you find the right one, then with the right element.

Code:
set myButton = myItem
myButton.click
or
mybutton.fireevent "onclick"

Of course using the .all is the long way, if you know the element by name then you could simply go straight to it and fire the click event.
 
So if the "Google Search" button = btnG and the "I'm Feeling Lucky" button is "btnI",
Code:
myHTMLDoc.documentElement.all("btnG").Click
would be correct? Would that go on the onclick event for the web browser control?
 
yuo never know, but its just a string isnt it - not a sub

you might be sble to activate the string with the eval function, but i douby if even that would work somehow
 
Assuming this is correct then it can go anywhere you like.

As long as myHTML is declared for the whole form then you could just create a commandbutton on your form and call this.
 
Alright, based on everything in this thread, this is what I've come up with. I've got my web browser control, and I've identified the names of my text box & 2 command buttons on the page. With that set, the forms OnLoad:
Code:
Private Sub Form_Load()
Dim myHTMLDoc As MSHTML.HTMLDocument

Set myHTMLDoc = Me.WebBrowser0.Document

Me.WebBrowser0.Navigate "www.google.com"
End Sub

Then I put a command button, and the code behind that is
Code:
Private Sub Command1_Click()
Dim myHTMLDoc As MSHTML.HTMLDocument
Dim GTxt As String

Dim db As DAO.Database
Dim rs As DAO.Recordset

Set db = CurrentDb()
Set rs = db.OpenRecordset("tblStamp")

Set myHTMLDoc = Me.WebBrowser0.Document


GTxt = Trim(myHTMLDoc.documentElement.all("q").Value)

With rs
.AddNew
rs![strEmpId] = VBA.Environ("username")
rs![strSearched] = GTxt
rs![dteTimeStamp] = Now
.Update
.Close
End With

Call myHTMLDoc.documentElement.all("btnG").Click

End Sub

What the command button does is it takes the text from the webages text box, and writes that to a table, along with the users username and a time stamp. It then submits it to Google with
Code:
Call myHTMLDoc.documentElement.all("btnG").Click

With all of the snipits and such in this thread, one could build a rather complex program that interacts with a web page (I know that's my next step!) Again, thank you for all of the help! +rep for you btw:)
 
That is great. Of course you should add some error handling.

The other thing that I had problems with was the webbrowser and/or the HTML document being ready. If either are not ready your code will probably error.

You might like to add

Code:
do until myHTML.readystate = "complete"
doevents
loop
The webbrowser also has ready states as well but I find that using the HTML document is better.

Also watch out for webpages that never return the "complete" ready state.
 
Alright, I'm back at this with another question! I'm trying to select a value in a combo box from my database. Ex: Combo box has 4 options, and by default, my user should always select "2". They could select it on their own, but I'd like to automate it. The following 2 snippets are 1)the html behind the combo box and 2)how I am controlling the page. The bold line is the line trying to control the combo box, but with no success. Any ideas?
Code:
<td align="right" class="formLabel" title="Report" valign="center" width="25%">Report</td>
                <td colspan=1 valign="top" >
                    <select tabindex="" id="select1" name="REPORT" onchange="return disableCriteria(this.form)">
                            <option value="" selected></option>
                            <!--
                            <option  value="1">A</option>
                            --> 
                            <option  value="2">B</option>
                            <option  value="3">C</option>
                            <option  value="4">D</option>
                            <option  value="5">E</option>
                        </select>                
                </td>
Code:
Private Sub SysLogOn_Click()
    Dim myHTMLDoc As MSHTML.HTMLDocument
    Dim oList As Object
    
    Set myHTMLDoc = Me.WebBrowser0.Document

    ' loop until the page finishes loading
    Do Until myHTMLDoc.ReadyState = "complete"
     DoEvents
    Loop
    
    ' enter username and password in textboxes
     myHTMLDoc.documentElement.Document.frmLogin.txtLogin.Value = "Default"
     myHTMLDoc.documentElement.Document.frmLogin.txtPassword.Value = "Default"
     
     ' click 'Submit' button
     myHTMLDoc.documentElement.Document.frmLogin.Logon.Value = "Logon"
     myHTMLDoc.documentElement.Document.frmLogin.submit
    
    'loop until the page is loaded
    Do Until myHTMLDoc.ReadyState = "complete"
     DoEvents
    Loop
    
    'navigate to the reports page
    Me.WebBrowser0.Navigate "http://sec.win.org/report/report.asp"
    
    'loop until the page is loaded
    Do Until myHTMLDoc.ReadyState = "complete"
     DoEvents
    Loop
    
    'set the value of the combo box to B
   [B] myHTMLDoc.documentElement.Document.Report.Report.Option.Value = "2"[/B]
    
    
    Set UserN = Nothing
    Set PW = Nothing
    Set ElementCol = Nothing
    Set myHTMLDoc = Nothing

End Sub
 
try something like this

Code:
myHTMLDoc.all.Item("REPORT").Value = 2
Should show you "B"
 
I tried that and got "Object doesn't support this property or method". The fact that it's running java upon selection wouldn't make a difference, would it?
Code:
onchange="return disableCriteria(this.form)">
that's in the line that starts with "<select" in the html.
 
Maybe but I doubt it.

Your error means we have the wrong object.

my first offer would be to look at the page for you but as it is passworded can you PM me some details.

I am a 100% beginner with JavaScript but what does this function do?

You will find the function either on this page or at the top you will see references to XXX.js files.

Once you have the right object you could fire the onchange event to see what happens.
 
All the javascript does is lock disable objects on the page based on the selection made. Java and I don't get along (mainly because I never learned it!), so I'm in the same boat you're in. I'll post the script:
Code:
function disableCriteria(frmThis)
{
  var i,j,k;
  i=frmThis.REPORT.selectedIndex;

  switch (document.report.REPORT.options[i].text) {
    case 'A':
      {
        
        document.report.txtN.disabled=false;
        document.report.DateRange.disabled=true;
        document.report.txtFDate.disabled=true;
        document.report.txtTDate.disabled=true;
        document.report.Month.disabled=true;
        document.report.YEAR.disabled=true;
        document.report.ExcelReport.disabled=true;
        document.report.AGENT.disabled=false;
        break;
     }
    case 'B':
      {
        
        document.report.txtN.disabled=false;
        document.report.DateRange.disabled=true;
        document.report.txtFDate.disabled=true;
        document.report.txtTDate.disabled=true;
        document.report.Month.disabled=true;
        document.report.YEAR.disabled=true;
        document.report.ExcelReport.disabled=true;
        document.report.AGENT.disabled=true;
        break;
     }    
    case 'C':
       {
        document.report.txtN.disabled=false;
        document.report.DateRange.disabled=true;
        document.report.txtFDate.disabled=true;
        document.report.txtTDate.disabled=true;
        document.report.Month.disabled=true;
        document.report.YEAR.disabled=true;
        document.report.ExcelReport.disabled=true;
        document.report.AGENT.disabled=false;
        break;         
       }
    case 'D':
       {
        document.report.txtN.disabled=true;
        document.report.DateRange.disabled=false;
        document.report.txtFDate.disabled=true;
        document.report.txtTDate.disabled=true;
        document.report.Month.disabled=false;
        document.report.YEAR.disabled=false;
        document.report.ExcelReport.disabled=true;
        document.report.AGENT.disabled=true;
        document.report.submit();
        break;         
       }        
    default : 
     {
        document.report.txtN.disabled=true;
         document.report.DateRange.disabled=true;
        document.report.txtFDate.disabled=true;
        document.report.txtTDate.disabled=true;
        document.report.Month.disabled=true;
        document.report.YEAR.disabled=true;
        document.report.AGENT.disabled=true;
        document.report.ExcelReport.disabled=true;
        break; 
     }
     
  }
   
}


  End -->

</script>
 
Ok. I do Not have the Experience to Be a "teller" I am still a tester and learn from mistakes.

Can you put both report and the 2 into variables and the try it.

The two is the important one, from memory i had problems with this. Maybe it should be a string and not integer.

Otherwise give me your URL and access codes per pm
 

Users who are viewing this thread

Back
Top Bottom