Exporting table to .txt file

Benjaminvh · Mar 27, 2009

I have a table (DATA) from which I am creating another table (TEMP). TEMP contains all of the fields of DATA, with the addition of several calculated fields as well as considerably more rows (1 row of DATA yields about 25 rows in TEMP). This has meant I cannot create the entire TEMP table at once (30 million+ records).

I was thinking of:

1. Running through, say, 50000 records of DATA to creat +-1 million records in TEMP, then exporting this to a txt file.
2. Repeating this procedure for each batch of 50000 DATA records until EOF, each time adding the (new) records of temp to THE SAME text file.

Once this is done, however, I need to actually make use of the records in the TEMP.txt file. Fortunately extensive grouping takes place at this stage (the variables 'grouped by' will depend on the user's inputs using combo boxes) such that the grouped text file would contain few enough records for access to handle. The problem here is, how do I extract the records from the text file to access, with the necessary grouping every now and again? I could perhaps import some of the text file, group-by make table query, repeat this and append the table, then group by, then import more data, append and group and so on until EOF. Can data from a text file be imported in parts?

Is there a way to do this? Or perhaps a better way of solving my problem? I am not set on using .txt format, its just that access cant handle the number of records I will create.

Thanks very much in advance
The guys on this forum are life-savers

DCrake · Mar 27, 2009

I am intreaged to know how 1 record can generate another 30 records. What type of data are we talking about? I think this may be a normalisation issue.

David

Benjaminvh · Mar 27, 2009

Hi David

2 of the fields in the DATA table are "DATE FROM" and "DATE TO". The TEMP table, apart from some (complicated) calculations, contains DURATION (which for this purpose simply means number of full months elapsed since DATE FROM) as one of its fields. That means for a record in DATA with DATE FROM=1/1/2004 and DATE TO=31/12/2005, the corresponding records in TEMP will number 24 at least (I say at least, since TEMP also includes YEAR, so every time the yr end is 'crossed', that month of duration is split over the two years, creating 2 records instead of 1).

Is this what you meant?

DCrake · Mar 27, 2009

Is the data some sort of payements schedule or billing data? How many fields are there in the Temp table? and what is it storing. Does the data in the Temp table become the child of a record in the Data (Parent) table?

Also what mechanism is used to create the child records?

David

Benjaminvh · Mar 27, 2009

David

Im working in insurance. The 'thing' im creating is an EXPOSURE TOOL, which basically (!!) calculates the amount of time that lives in certain categories (such as age, gender, smoker status, year etc etc etc) are exposed to risk. This is then used to determine mortality rates. This is what the DATE FROM and DATE TO give - the TOTAL exposure for that policyholder. For the purposes of my investigation, however, I need to split this exposure into DURATION (which more correctly in this context is the number of full months since the policy was taken out).

For your interest, the reason duration is necessary is because it is likely two otherwise identical lives would have different mortality (death probability) if one life has just taken out a policy and the other has taken out the policy 10 years ago, due to the fact the former life recently went for medical tests.

I can give you the code. But basically, it loops through one record of DATA, starting at DATE FROM. At this point, duration is 0. Then add a month to DATE FROM, duration is one - which means a new record is needed in TEMP. Continue doing this, adding a month each time and creating a new record for this new duration, until DATE TO is 'crossed'. Then move on to the next record and repeat.

In this way, the TEMP table is constructed entirely from DATA, but is some 30 times bigger (since the average duration of the policies at this company is 2.5 years = 30 months). Maybe this is what you mean by "child table"?

Benjaminvh · Mar 27, 2009

Oh and forgot to mention, there are about 40 fields in the table (policy code, policy type, age, gender etc etc)

gemma-the-husky · Mar 27, 2009

benjamin

its still unclear to me exactly what is going on

why do you need to generate a tabel outside access, then reimport it.

---------
in any system one way of doing things like this is to reduce the size of the dataset

instead of doing all your records together, split the data into different sections

eg - by fist letter of surname
or by month of birth
or apply your smokers filter and other filters etc, to reduce the size of the dataset

------------
i think you need to consider this in terms of the output...... - clearly 30million records are too many to view manually - so what is the end result you are trying to produce. Just work towards that.

DCrake · Mar 27, 2009

Lets skip the problem and assume that you have the data in the table as you wanted it. What arre you doing with it.

Is is used for viewing purposes only?
Do you want to look at the last record only to see the final outcome?
Do you want to print it out?
What is the likely hood of wanting to view the data for each policy en-bloc?

My way of thinking is that if I want to do any of the above for an individual policy I would run the temp creation on the fly and return the data there and then. Creating 300m records just on the change that the user wants to look at a specific policy is a bit OTT.

David

Benjaminvh · Mar 27, 2009

Thats so irritating. Itype out a massive reply and click "submit" and it tells me I must first log in. Fuck.

Well, here goes again (this time Ill save the message before i send just in case!).

It is difficult to explain precisely whats going on here since a lot of the stuff is specialist knowledge. Ill give it a go.

For any incidence rate to be calculated, we need to things:

1. The number of events that occured (e.g. illness, deaths)
2. The amount of time that lives were exposed to the risk of illness/death/etc etc = EXPOSURE

Together, these give the mortality/sickness/withdrawal/lapse/HIV rate.

At the moment Im working with exposure, but will soon repeat the task with decrements (fancy word to describe all possible ways of leaving the group - death, withdrawal, claim etc). Then there will be no size issues since a policyholder on books for 4 years would contribute at least 48 records to TEMP but, since he only dies at most once, will only contribute at most one line to the DECREMENT table.

RATING FACTORS are things by which one would expect experience to differ. By 'experience', I mean the actual mortality/withdrawal etc experienced by the insurance company, as opposed to th assumptions used before the policy was sold. The idea is to compare the two and then update assumptions where necessary in future. Age, Gender, Smoker Status, Income are all rating factors, as is Duration and calendar year perhaps. Literally anything that can split up the total group of lives into two or more groups where the mortality rate may differ between the groups is a rating factor.

The TEMP table will contain all 60+ rating factors, meaning that grouping by these is useless since it hardly reduces the size (85 ages, over 100 durations, the possibilities are massive).

Depending on what investigation you are doing, different rating factors will be important. Say we are doing a mortlaity investigation. Age, Gender, Smoker Status, Duration are important, so we can ignore the other 56. Now when I group the 30+ million records of TEMP by only these 4 factors, the result is a far smaller table that can be viewed manually.

The benfits of first creating the TEMP table and then working from it are big. To run the programme through the entire DATA table will take over 24 hours to run, probably due to inefficiencies in my code. If the TEMP table (which is created only once a year) exists, then importing and grouping by the 4 factors and summing exposure should be a far quicker job, which is important.

Any ideas?
Does that explain it?

DCrake · Mar 27, 2009

In view of your comments it may be worth your while to use SQL Server to hold your data and link the tables via ODBC. SQL will give you more flexibility and robustness and it can handle the amount of data without any issues.

Benjaminvh · Mar 27, 2009

yes I think SQL server may be the way to go. Not sure whether it is possible at work here, but definitely something I should look into...

gemma-the-husky · Mar 27, 2009

having read your latest point, then my next question is "why is the temp table significant?" I really am struggling to understand why this is important or even necessary at all

if you know when a member joined, when his dob is, what his special factore are etc, then surely you can calculate his death benefit/likelihood etc without needing a monthly table to show he continued to be a member

surely you can get that directly from his membership start/end values

and even if some of the caclulation factors change at a given point in time, then you can store these changes in a normalised way, without requiring the massive temp table

it seems to me that the temptable is spreadsheet design, not relational database design.

if you have 50000 policy holders, then you should only need to compute 50000 outcomes - the whole problem becomes much smaller, and much more computable, in real time

any thoughts?

Exporting table to .txt file

Benjaminvh

Registered User.

DCrake

Remembered

Benjaminvh

Registered User.

DCrake

Remembered

Benjaminvh

Registered User.

Benjaminvh

Registered User.

gemma-the-husky

Super Moderator

DCrake

Remembered

Benjaminvh

Registered User.

DCrake

Remembered

Benjaminvh

Registered User.

gemma-the-husky

Super Moderator

Similar threads

Users who are viewing this thread