Need Help Parsing HTML

Fuse3k

Registered User.
Local time
Today, 02:24
Joined
Oct 24, 2007
Messages
74
Hello All,

Here's what I'm trying to do: I am creating an instance of Internet Explorer, logging into to a site, navigating to a particular screen and grabbing two pieces of information for each table row. So far I've gotten to the page where the table is. The problem I am running into is finding the data I need. The table data is created dynamically and I have no idea what the table values are. The table shows a list of accounts in our system and in the final column there's a button to open the account details.. I need to take the Account Name from the first column and then I need to parse out the OnClick event of the button to capture an Account ID.

The account list is nested in a <tbody> tag in the HTML and this tag is only used once in the HTML source. I figured I can just grab the data from within the <tbody> tag but that's where I start having trouble.

Here's a sample of the source. It is just one of potentially several hundred rows. I need the scan each row and capture these two piece of data for each.

Code:
<tbody>
<tr class="gridEven"> 
      <td width="150px" valign="top" align="left" class="listText">[COLOR=Red][B]ACME Company Inc.[/B][/COLOR]</td> 
      <td width="80px" valign="top" align="left" class="listText">7fk52q0wt</td> 
      <td width="100px" valign="top" align="left" class="listText"></td> 
      <td width="50px" valign="top" align="left" class="listText">Active</td> 
      <td width="125px" valign="top" align="left" class="listText">John Doe</td> 
      <td width="75px" valign="top" align="left" class="listText">07/30/2009</td> 
     <td width="1px" valign="top" align="left" class="listText"> 
    <table border="0" cellpadding="0" cellspacing="0" class="noborder"> 
 <tr> 
      <td align="left" class="noborder"> 
      <input type="button" name="button3" id="button3" value="View" class="btnView" onclick="urlAction('billerViewDisplay.do?action=display&biller.billerID=[COLOR=Red][B]2DCB-AE48-B865-2285[/B][/COLOR]')" /> 
</td> 
     <td align="left"> 
     <input type="button" name="button3" id="button3" value="Edit" class="btnView" onclick="urlAction('editBillerDisplay.do?action=display&biller.billerID=2DCB-AE48-B865-2285')" /> 
 </td> 
 </tr> 
  </table> 
</td> 
</tr> 
</tbody>
Here's a few things I noticed:

1. The account name is the only column that has the "width" attribute set to 150px. I figured I could utilize that information to find the exact <td> that holds the account name.

2. The Accoint ID will always be exactly 19 characters in length. I can probably use InStr() to loop though the text and parse this out pretty easily.

I'm getting snagged on the following when trying to get the account name:

1. Assigning the "InnerHTML" within the <tbody> tag as an object in VBA so I can iterate though the elements within. When I debug.print it's only showing me values; not the html tags....

2. Finding the correct element based on the "width" attribute.

Here's what I'm working with so far.

Code:
 Dim appIE As InternetExplorer ' InternetExplorer.Application
Dim sURL As String
Dim UserN As Object ' MSHTML.IHTMLElement
Dim PW As Object ' MSHTML.IHTMLElement
Dim Element As Object ' HTMLButtonElement
Dim btnInput As Object ' MSHTML.HTMLInputElement
Dim ElementCol As Object ' MSHTML.IHTMLElementCollection
Dim Link As Object ' MSHTML.HTMLAnchorElement

Dim Start 'Timer

Set appIE = New InternetExplorer

sURL = "https://mysite.com"
 
With appIE
    .Navigate sURL
    .Visible = True
End With
 
' loop until the page finishes loading
Do While appIE.Busy
Loop
Do While appIE.ReadyState <> 4
Loop
 
' enter username and password in textboxes
Set UserN = appIE.Document.getElementsByName("userName")
If Not UserN Is Nothing Then
    ' fill in first element named "username", assumed to be the login name field
    UserN(0).Value = "XXXXXXXX"
End If
 
Set PW = appIE.Document.getElementsByName("password")
' password
If Not PW Is Nothing Then
    ' fill in first element named "password", assumed to be the password field
    PW(0).Value = "XXXXX"
End If
 
' click 'Submit' button
Set ElementCol = appIE.Document.getElementsByTagName("INPUT")
 
' loop through all 'input' elements and find the one with the value "Login"
For Each btnInput In ElementCol
    If btnInput.Value = "Login" Then
        btnInput.Click
        Exit For
    End If
Next btnInput
 
' loop until the page finishes loading
Do While appIE.Busy
Loop
Do While appIE.ReadyState <> 4
Loop

' click a text link on the page after that
Set ElementCol = appIE.Document.getElementsByTagName("a")
 
For Each Link In ElementCol
    If Link.innerhtml = "Account List" Then
        Link.Click
        Exit For
    End If
Next Link
 
' loop until the page finishes loading
Do While appIE.Busy
Loop
Do While appIE.ReadyState <> 4
Loop

'Grab HTML from within <tbody> tag.

Set ElementCol = appIE.Document.getElementsByTagName("tbody")
 
For Each Element In ElementCol
   'Help
   'Needed
   'Here!
Next Element

Set appIE = Nothing
Thanks in advance for ANY help. If you need more info just ask.
 

Users who are viewing this thread

Back
Top Bottom