3.12 GB Database - Is this amount of data beyond what Access can handle?

mtairhead

Registered User.
Local time
Today, 08:58
Joined
Oct 17, 2003
Messages
138
I have several CSV text files that add up to 3.12 GB. I Would like all of this data to be placed within MS Access. My plan would be to import each text file as a table and then design a union query.

I know that Access is a superb program, but am I stretching MSFT's limits, here? 3.12 GB is a lot of data. Each record contains what would be a memo field with potentially unlimited data within.

The only reason I ask this question, is that I'm having trouble getting Access to even import one of these files. (On average, they're about 65 MB each) I realize that it's likely more of a hardware/system resource issue, but how is Access going to handle all of this data, even IF I can get it to import somehow?

I guess the questions I need answered are:

Am I beyond Access's abilities?
If so, what database program can I use instead?
If not, how do I import 3.12 GB of data into Access?
Will you pray for me and the stability of my computer?

Thanks,

Andrew
 
mtairhead said:
I know that Access is a superb program,
Yup
but am I stretching MSFT's limits,
Yup
Am I beyond Access's abilities?
Yup
what database program can I use instead?
SQL Server ?
how do I import 3.12 GB of data into Access?
You Don't
Will you pray for me and the stability of my computer?
Are you kidding

Okay now the serious bit

Limit on Access I think is 1 Gb table and 2 Gb overall

If only a few files why not use a text management file as a vehicle to patch the files together

Think in Unix there is a command to combine files.

Need some experts here to solve the problem cos I do not think Access is going to be your salvation

Len B
 
If no single table is larger than 2 gig (actually somewhat less because of overhead), you can use separate db's to hold each table. Then use a front end db to link to the several back end db's. Memo fields are limited to 64K so if your memo fields potentially contain more characters, the excess data will be truncated. No one could make use of a recordset containing millions of rows so I would never union multiple large recordsets. Create queries that summarize or extract what you need from each large table and then union the smaller select queries.
 
Pat: You're solution sparked another idea-
Is there any reason why I cannot just link the textfiles to the database, each as a separate table? This would not increase the size of the database by anymore than a few 100 KB per textfile. What are the drawbacks?

Len (Or anyone else for that matter): Can you advise on some resources concerning SQL server? While I have very little experience with SQL (Aside from various queries within Access), I'm willing to learn. I will eventually be forced to begin using SQL server. The sooner I tackle that the better.

Andrew
 
I suggest you need to rethink what you are attempting to do to make sure you get what your intend.

You could use OLE within the database to provide the link to your text files.

You say you have several that add up to 3.12 Gb. How many is several ?.

What would you actually achieve through this linking ?.
Does this actual achieve what your objective ?
The Files remain "unsearcheable" !!


Len
 
The best way to explain what I'm doing is an online encyclopedia...But not as nice. (Yet)

By using MS Access, Google, and a lot of my time, I have compiled a list of educational URLs. Basically, for almost every word in the English language, there are an average of 5-7 ".edu" addresses that relate in subject to that word.

These pages have been downloaded by my computer in comma-separated format to be implemented into my database. Once implemented, the database will search each record (So it will search each page) and sort out what URLs relate to what words. The idea being that some pages may relate to more than one English word. The end result will be like an encyclopedia, with less reliable information.

So...Right now I have 60 .txt files, averaging about 60MB full of data concerning individual web pages. These need to be searched and sorted. Because I know how to use Access, I thought that would be the best step…But I’m opened to any suggestions.
 
Okay

Now a memo field is limited to 64K
Memo fields are not searchable to the best of my knowledge

So hopw are you going to handle this situation.

Best advice I can offer is a table

Word URL
fred url1.edu
fred url2.edu
etc etc.edu

This then gives a list of words with applicable URL's and is searchable


Its lots of records

but the field sizes are small and so maybe you get within 2Gb's

How many words ?
times say 8 gives number of records

I am struggling with this one cos methinks this is not difficult but is large and Access may not be able to handle purel;y because of file size

Len
 
I am struggling with this one cos methinks this is not difficult but is large and Access may not be able to handle purel;y because of file size

Exactly - This problem isn't difficult at all...I know how to do all of this - Even a workaround, duct-tape technique to search a memo field....But I can't do any of it if Access is limiting me.

I'll let you know how your solution works out for me. I'm envisioning success...That's what basketball players do, right? Does that work with computers? :)

Thanks,

Andrew
 
Just bear in mind

Pessimists are never disappointed (as my wife tells me) but being an optimist myself I am frequently disappointed

Good Luck

Len B
 
Ok - Oops. Failed to mention one big problem...This might be the end of my project.

I think that many of these 60 comma-separated text files may be corrupt. Whenever I import/link a text file, I get this message:

"One or more rows of data in your file contain too many characters to import. The maximum characters per row is 65000."

I think that this means that, in as little as one record per text file, the field where the actual web page text is stored is probably too long. Would you agree with that? Therefore, if a page happens to have more than 65000 (Or less, because I also have other characters in other fields on each row) characters in it, the text file will likely not easily be linked/imported.

My duck-tape solution: I'm going to attempt to first import the data into MS Excel. Then, I'm going to create a new field that will reference the "Web Page Text" field, but only print the first several thousand characters. This should get things down to size.

Unless there's a better solution, that's what I'm going with.

Thanks,

Andrew
 
Hmmmm

Think that if you import a csv file into Excel you will have each record on a new row.

Excell has limit if 65000 rows I think

L
 
Interesting results:

I did import into Excel, and did end up getting a record on each row. I decided to try something before continuing, though. Instead of creating a new field with just the first few thousand characters from the "Web Page Text" column, I saved the Excel file "as is" and exited the program. I then linked an access table to the Excel file, instead of the text file. It worked...No problems. The only difference between the files are the visual representation.

I don't know why, and I still may have issues in the future, but I think database should hold up.

Thank you for your help! I think I will eventually abandon Access for a more powerful program, but for now I'll be able to continue using it...Just with a few duct-tape tactics.

Thanks,

Andrew
 
If I understand what you want to do, your problem isn't necessarily holding the keywords. It is holding the definitions. But there are many ways to skin that particular cat.

The problem with importing a CSV file is that depending on some pretty ugly factors, the CSV is not particularly likely to be easy to text-parse. Unless you were extremely lucky when you downloaded, you didn't get something that Access would naturally recognize. Which is why the import wizard is having such a hissy fit.

But maybe that isn't your real problem anyway. Your problem is storage format once you import whatever you are going to import.

For instance, by raw byte count, holding thousands of photos at 50Kby+ per photo will blow out a DB size limit quickly. Yet if you do it right, you never even know you have the problem. 'cause all you need is the subject keyword and a file spec to the item in question. Lots of DOS-based picture and clip-art programs used this exact same trick. Heck, I use it myself. In my case I'm using an image control to show the picture but all I have in the DB is the picture's fully qualified file specs and some subject keywords. Then when I navigate using the form I've built, I just load one picture (at a time) in the FormCurrent event.

Now I'm not suggesting you use 'picture' format, but maybe you can find a way to break up your humongous CSV files. Then store the spec for a new CSV file that is a subset of these big files you describe. Also store whatever keyword you would look up for this entry along with the spec. Even if it means a few multi-layer tables, your detail records only have to point, by your standards, to between 5 and 7 addresses. Which means between 5 and 7 articles. If you could somehow store this so that only one file is used for each final detail item, you could open one external file at a time in your equivalent FormOpen event.

Now, the next kicker is that with all this stuff external, you cannot search the articles. But as memo fields, you would not have been able to do so anyway, so you are no worse off.

I'm not going to suggest how you do it, but if you could break these files up in the way I suggested, you could externally search the files when you parse them the first time in order to get them into reasonable chunks. Then build a table based on this search activity. I.e. store your cross-reference keywords when you store your primary keywords.

Your 3.12 Gb total size is unreasonable for Access and, to be honest, even ORACLE will take a while to search something that big. Not to mention that ORACLE don't care that much for long records over 65Kb either. So if you try ORACLE to handle this, you'll still have the problem, you'll just have it spread across multiple utilities with more expensive licenses.

I think your REAL problem is that you designed your project wrong-end first, 'cause you decided what you wanted but didn't decide how to set it up within the constraints of your chosen platform. I.e. you came up with content before you came up with methods of usage. With due respect, I think you are getting exactly what anyone else would get in that situation. A major headache. 'cause now you are going nuts trying to retrofit what you have to what you really wanted.
 
While I don't think I've explained correctly/well what I want to do, I think you're right.

I started off with a small database of English words and their definitions. I then used MS Access to create a list of URLs to "Spider". The list was pretty easy - I first changed all spaces between words in phrases to "+" signs, and then implemented them into the formula that Google uses to search exact phrases on .edu sites and only return five results. An example is:
http://www.google.com/search?as_q=&...ct=any&as_dt=i&as_sitesearch=.edu&safe=images

Once I had a URL for every English word, I had my "Spider" software load each URL and collect any external links - Of course these links were the results of the Google search.

I had Access combine all of the links into one table. Then I had my "Spider" actually visit the pages and download any text on the site. The result is what I have now: About 60 CSV files of web pages and their text.

That's where I am now, and here is what I would like to do:

I need to be able to create a separate page for every English word and its definition. I will then use a formula to find URLs related to that English word, obtained while "spidering" Google results. To find relevant URLs in the database, Access will need to search (Via a query) the large memo field because it contains the text of what's on the web page. There is no other way of identifying what's on the page, besides the URL. (For those who are objecting: You can also use the Title, Keywords, and Description to identify the topic of a page, however you wouldn't believe how many people don't bother putting them in - Google doesn't even bother looking much at Metatags anymore)

My solution is to combine the data into one query. By linking tables to a database, and then designing a union query, I can do this without going over the 2 GB limit of the database. I already have designed a VB module that will search for pages relevant to each individual English word and definition. That module will create an .html page for each word.

Should I have gone about this a different way? The first version of the project actually works well; it's just a bit featureless. This new version needed better data (Thus Google). I already have the infrastructure together from a previous version.

Right now I'm plowing ahead and linking these tables to one database. I'm expecting a long wait, though....
 
The way I see it, you have three problems:

1. Storage format for your end-data that will conform to Access usage. That is probably manageable.

2. DB Format for your keywords and references to the end-data selected by a given keyword. This is where you need to be careful.

3. Once you have the prototype working, you have to gather the data using your automated method. This is where you have the greatest work ahead of you. If you have a few BIG files, Access can't handle it. If you have lots of smaller files, your file folders become terribly bloated and WINDOWS won't handle it. Neither FAT16 nor FAT32 nor NTFS like bloated directories/folders. This is where some sort of automated folder selection scheme for file storage becomes important. And here, I don't know which way to advise you 'cause I don't know how you think in design terms.
 
After I posted the above, another factor came to mind.

Autonumbers can only reach 32 bit quantities. With 3.12 Gb, you will have lots of numbers that you might generate. (Don't know how many words you are contemplating.) BUT in any table that has 5-7 references per word, you will chew up autonumbers like they were going out of style. So you might wish to consider an alternative format for your primary key. Wish I could advise you, but I'm not sure what limits you will reach.
 
Ever consider going to MySQL as the backend, and Access as the front end?
 
So you might wish to consider an alternative format for your primary key. Wish I could advise you, but I'm not sure what limits you will reach.

We'll find out about those limits.... I'm sure I'll hit them.

As for mySQL: No. If anyone could give me any resources on how I can accomplish this, I would love it. I don't know how to use any database utility accept Access, though I'd be willing to learn.

Thanks,

Andrew
 
truncated records querying memo field

I am not convinced you are on the wrong track. I use MS Access as a backend, and PHP/MySQL to parse the data into a javascript heavy web interface.... and using a system like that there is no issue with "not being able to search memo fields". Just parse it out and use a PHP/ASP/java/javascript search engine... anyway, I can tell you that I have overloaded and duct taped Access databases up to 60MB. and they really don't react well. In particular I had a memo field where each record had about 400 pages of text in it. A few hard drive failures (no worries, all backed up) and I have lost my original config for everything- I have missed a step along the way, and as a result when the PHP parses the text through ODBC it truncates the memo fields at about 53 lines. Argh! So that no one thinks I am crazy in doing it the way I do, please understand I work for a govt and I have to follow ridiculous rules, and that is what I can do and be in the rules. sigh.
 
All I can say, is that you rock....
All is doing well so far. I have been able to link all data, and am working on a VB sub that matches data with relevent keywords. So far, the most accurate conclusion I can give you is that it's slow...Painfully slow. I can live with that, though. It's not my electricity I'm burning - not my time I'm wasting.

Thanks

Andrew
 

Users who are viewing this thread

Back
Top Bottom