Query Webpage Using VBA

access10000

Registered User.
Local time
Today, 06:07
Joined
May 31, 2007
Messages
25
Hi

I would like to find an efficient and effective way of retrieving text from webpages using VBA in Access, basically a web query. I know this can be done in Excel and have experimented with the code below to do this in Access. I really just need to get the raw data into a file so I can link/import and manipulate it further as needed. If this is dumped as one field that is okay for what I'm after.

The code works fine if you want to know the script behind the webpage etc. but I'm after retrieving data from the page as a user would see it or at least including this.

For example, if I wanted to automatically retrieve prices from a webpage I am after the product description and price, not a load of html code. I have tried various different file formats and at the moment 'xls' is the closest.

Does anyone know of a better method to retrieve just the text as is?

Many thanks

-----------------------------------------------------------------------

'The only ADO or DAO reference present must be "Microsoft DAO 3.6 Object Library"
Option Compare Database
Option Explicit
'Declare function from URLMON.DLL library, installed with Internet Explorer
Private Declare Function URLDownloadToFile Lib "urlmon" Alias _
"URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal _
szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long
Function GetWeb()
Dim PageAddress As String
Dim PageDownLoad As Long
PageAddress = "http://www.access-programmers.co.uk/forums/showthread.php?t=73744"
PageDownLoad = URLDownloadToFile(0, PageAddress, "C:\test.xls", 0, 0)

End Function
 
do a little search here and look in the FAQ's to get the basics.

Could you also tell me the URL and what exactly you want.
 
what does that DLL function do for you. I'd be interested in learning about that function...I have not seen it in any dll viewer that I have, but I've not browsed all the dll's in windows either, just the common ones for use.

In Access, I don't believe there is an equivalent to MS query that you can use in Excel. But as far as I know, the query program in excel only has the ability to query data inside of html tables.

as far as using vba code to get data from a page, have you considered, if you're using version 2007, using the webrowser active x control? It's much more efficient than manipulating IE through code directly, and the control is also compatible with the javascript code that you can write in VBA to manipulate a browser directly.

You do not have a choice, regardless of the method you use (other than ms query), to not use html code as the basis for extracting data. html is the basis for a webpage, and the majority of content on a page is located within some sort of html tag, regardless of whether it is hardcoded or written to the page dynamically for each page request.

one of the easiest methods to extract full text out of a tag though, is to use the "innertext" property. Reference to that will return all of the text located inside the specific tag.
 
darbid, I have been researching this for a couple of days and the code I posted is the best I have managed to get together. You may notice from the 'PageAddress' in the code I posted that I have already tried to find an answer in this forum and many others across the net.

I don't have a specific URL as I'm investigating a general method for future development ideas. However, a better example would be something like this where I could subsequently filter for the model names and prices through a text file.

http://www.pcworld.co.uk/gbuk/desktop-pcs/706_7021_70021_xx_xx/xx-criteria.html


Thanks the_net_2.0 I will take a look at 'innertext'. I came across the dll while looking into all this. There might be more you can do with it but this function extracts html from a webpage and downloads to a file, so useful if that's what you need. I am not aware of a standard web query function either. I am using Access 2003 at the moment but will also keep the webrowser active x control in mind for 2007.
 
Last edited:
Annoyingly I cannot find the thread I was thinking about. Someone (one of the regulars like ajetrumpet) has written a little tutorial on how to get started. They are probably reading this and will reference it as they like HTML questions.

So before you get into actual code - your function above seems to be a bit by bit downloader. It downloads the whole file you point it to. I have never used it but what is the advantage of using Excel files? Does it put things in cells for you?

We can certainly use this function in Access, in fact we can use anything in Access that Excel can use because if Access does not natively do it then we will use Excel from Access to get the job done.

for example excel has some cool webpage downloaders that will take tables from a webpage.

What are you trying to do - have you built a nice table with product name, description and price and you want to fill it with records of what you find?

Getting exact information from a web page means knowing exactly what you are looking for. In my opinion it is better to scrape or get your info directly from the webpage as a HTML file and not xls. The reason for this is that you can use the html Document Object Model DOM which is what Net 2 is talking about.

If you are thinking of some kind or webcrawler like google that you point to webpages then I am not sure you can get it all into a table. You could however save all the pages as webpages locally. Then search for words within that text and then as a result open that file as a webpage again and scroll to the part found.

So could you let us know some more info on where you want to save the info and how you want to use the info.
 
Okay, thanks for your reply. There is no advantage to downloading to an Excel file, not for what I'm trying to do and I would rather not have it in this format at all. This was the last format I experimented with to see how it came out. Sorry if this was misleading.. The format I am really after is text or csv, as 'innertext'. With the code I posted as it is, if "C:\test.xls" is changed to "C:\test.txt" you get html code. If this can still be converted to the innertext somehow afterwards that's perfect. I don't know much about this stuff though.

Basically what I am trying to do is grab the text from a whole webpage and dump it into a file then link to Access as a linked table. From here, and after I had initially analysed the output to see how I could get what I needed from it, queries in a database would strip out what was needed and use the data as required, make tidy local tables, match model numbers it found and update latest prices or whatever. It's a generic solution I'm looking for that can be used as a base for retrieving information from the web and using it in Access.

I have called Excel from Access before for regular functions but I didn't think of using it for this. Good idea.. It's an area I'm just starting to look at so I was going for a 'get all the data and use Access to do the rest' method.

However, I now have the module below which worked the first couple of times and has given me what I need, but now always fails. I'm pretty sure the problem is around defining or setting 'appIE.Document.body.innertext' as an object but I'm not sure how to do this. If this is to with 'HTML Document Object Model DOM' could you tell me how to set this or point me in the right direction? Of course if there are better ideas altogether, please let me know..

Many thanks

-----------------------------------------------------------------------

Function GetWebText()

'Microsoft Internet Controls
'Microsoft HTML Object Library

Dim appIE As InternetExplorer
Dim pageAddress As String
Dim PageText As String

Set appIE = New InternetExplorer

PageAddress = http://www.tek-tips.com/faqs.cfm?fid=4579

With appIE
.Navigate PageAddress
End With

Do While appIE.Busy
Loop

PageText = appIE.Document.body.innertext

Open "C:\Data.txt" For Output As #1
Write #1, PageText
Close #1

End Function
 
Okay, thanks darbid. I'll take a look and try to get a solution.
 

Users who are viewing this thread

Back
Top Bottom