Importing html files into Access 2007

Ken

New member
Local time
Today, 15:15
Joined
Jul 19, 2009
Messages
1
I have searched for solutions here and read many posts. importing html site:http://www.access-rogrammers.co.uk/forums

I run Vista and Access 2007 and have a copy of Microsoft Visual Studio 2008 Express-Edition . I can create and build excel macros but am just a medical device engineer and not a programmer (yet :cool:). None the less, I want to get to know how to use Access and some VB.

I have a smallish project. I have well over 10k html files which are website pages all with the same format but without tables. I want to move the data of interest from the html to a data base.

This is the format of the core of the html data I want to extract form the files.

</SMALL></FONT></TD></TR></TABLE></CENTER>
<P ALIGN=CENTER><BIG><BIG><STRONG>This is title text</STRONG></BIG></BIG>
<P ALIGN=CENTER><STRONG>Presented By: <BIG>first name, last name</BIG> <<A HREF="mailto:person@gmail.com?subject=Here is some more text">person@gmail.com</A>><BR>Date: Wednesday, 11 June 2005, at 11:40 p.m.
</STRONG><BLOCKQUOTE><FONT COLOR="#FFFFFF">
<P>This could be thousands of words of text
<P>
</FONT></BLOCKQUOTE>
<CENTER><P><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=6 BGCOLOR="#000000"><TR><TD ALIGN=CENTER><FONT FACE="Arial"><SMALL>

There is 6 fields of data here that I want to capture and import.
  • This is title text
  • first name
  • last name
  • email
  • date
  • This could be thousands of words of text


I tried importing a single file with import html and only get this error: Text field specification field separator matches decimal separator or text delimiter. followed by:Error occured trying to import file ... file not imported.


I need some guidance. At the moment it looks daunting but I know I can do it. I sure don't want to do this data input by hand. I looked at some of the threads based on this search:
string manipulation importing html site:http://www.access-programmers.co.uk/forums

I found unanswered questions or excel importing or things that did not make sense to me.

I hope someone can give me some basic tips to get going in the right direction. I will need to RTFM but I am hoping I can have my teeth in this problem enough to hear it squeal a little and not just laugh.

Thanks for considering this problem.
 
I've written some 'screen scrapers' that do essentially this job.
I use a table tPattern to store and manipulate the text to search for in the HTML document.
Code:
[B]tPattern[/B]
PatternID  
FieldName - the name of the field in your data table
Ordinal - the order of the data in the source html
PatternBefore - string that immediately preceeds the data you want to extract
PatternAfter - string that immediately follows the data
Code:
Now for each record in tPattern...
  1) find the location of the PatternBefore and PatternAfter in the HTML
      - InStr() function
  2) extract the data between these locations.  
      - Mid() function
  3) save extracted string to the field FieldName in your data table.  
  4) use the last ending location as the new start for pattern search
Loop
 

Users who are viewing this thread

Back
Top Bottom