Combining multiple records into one record (1 Viewer)

doma23 · May 3, 2011

For example, this is what I have in a query:

Product..Type.....Period....R_Date...........No1.. ....No2
Bar.......Core.....FY10........31.01.2011.....27.. ...
Bar.......Core.....FY10........18.10.2010.....35.. ...15
Bar.......Core.....FY10........25.12.2010.....21.. ...20
PLL......Group.....FY10........14.04.2010.....14.. ...16
PLL......Group.....FY10........18.09.2010.....14.. ...
PLL......Group.....FY10........27.01.2011.....15.. ...

Desired output:
Product..Type.....Period....R_Date...........No1.. ....No2
Bar.......Core.....FY10......31.01.2011......27... ...20
PLL.......Group.....FY10.....27.01.2011......15... ...16

As you can see, the procedure should take the latest data, and if the data is missing it should take the most recent that it’s not missing.
For example, the last record, it took 15 as No1, and it took 16 as No2 because that was the latest available data.

Any ideas? I have none

Thanks!

John Big Booty · May 3, 2011

Have a look at a Totals Query.

You should be able to Group By; Product, Type and Period, and then use Max for R_date, No1 and No2

doma23 · May 4, 2011

Hi, thanks for answering.

But, that would only take the latest (by date) record. If the No2 is missing, it wouldn't go on further to take the latest No2 that is not missing.

Galaxiom · May 4, 2011

The real problem here is your data structure. Fields No1 and No2 should be held in a related table. Then the query would be very simple.

Are you at liberty to change the data structure?

doma23 · May 4, 2011

Yeah, I am at liberty to change the data structure, but the problem is that I've built quite complex tool based on this table structure, and it would just take too much time to connect everything once again if I was about to change the data structure.
I've heard also from other source that the problem is that data is not normalized.

How would the normalized data looked like and how would you change the data structure?

Galaxiom · May 4, 2011

It is always hard to make the change when the forms are built becuase so much needs to edited and retested. It also means you have to break the way you understand the data.

Without knowing a lot of detail it is difficult to offer precise advice. However when a table holds multiple fields with the same kind of data then it is usually a good indication. However I do notice you are working from a query. Maybe your actual tables are OK?

The related table would have these fields:
A single field with values for No1 and No2
A field to indicate that the value belongs to No1 or No2 (if it matters)
Foreign Key to the table where they are currently stored.

Anishtain4 · May 4, 2011

if you wanna get rid of the reprogramming all of your forms you can change the structure and make a query which joins two tables, then just replace name of the table with the name of the query in your forms, this is so fast. but if I were in your shoes I would redo things cause that would make you confused if you wanna get back to your work later for some developments

doma23 · May 4, 2011

I couldn't agree more, it seems almost impossible to make that kind of a change now.

Unfortunately, the query is the same as the table, with only difference that the data from the table is being filtered by Date in the query.
I think I now understand to some degree how I needed to structure the database at the beginning.

And if you don't mind, I would like to get a little bit deeper on normalization to fully understand it, so that I wouldn't make the same mistake twice.

So in one table the fields would be named "No1","No2".....up to...."No40" (I have cca. 40 different figures for one record). There would be additional column (field) which would represent the primary key - would this be formatted as autonumber or?

Then there would be just one other table which would contain these informations:

Product...Type....Period....R_Date
Bar.......Core.....FY10........31.01.2011
Bar.......Core.....FY10........18.10.2010
Bar.......Core.....FY10........25.12.2010
PLL......Group.....FY10........14.04.2010
PLL......Group.....FY10........18.09.2010
PLL......Group.....FY10........27.01.2011

...together with the "Foreign key" field which would connect the two tables? How would I connect the "Primary key" and "Foreign key" fields, they should have the same numbers?
Would it make more sense, to separate this table further into "Product", "Type" and "Period" tables?

Thanks a lot!

Galaxiom · May 5, 2011

doma23 said:
So in one table the fields would be named "No1","No2".....up to...."No40" (I have cca. 40 different figures for one record).

No. The No(x) values you currently have as separate fields would all be stored in a single field with one record for each value.

If there is some intrinsic difference in the nature of those values that must be preserved then another field would hold a number between one and forty to indicate what was originally defined by the separate field. If the numbers are all really just the same measurement then you would not use this field.

Note how in this system there is no limit to how many values can be attributed to a single record in the other table where previously you would have to add another field if more values were required.

There would be additional column (field) which would represent the primary key - would this be formatted as autonumber or?

This field is the foreign key. It indicates the related record in what remains of the original table. It would just be a Long integer datatype.

The autonumber is in the original table like this.

PK...Product...Type....Period....R_Date
1...Bar.......Core.....FY10........31.01.2011
2...Bar.......Core.....FY10........18.10.2010

...together with the "Foreign key" field which would connect the two tables? How would I connect the "Primary key" and "Foreign key" fields, they should have the same numbers?

Yes the foreign key in records in the Nos table is the matching PK in the original table.

Would it make more sense, to separate this table further into "Product", "Type" and "Period" tables?

I don't think so. Product Bar and Type Core seem to indicate the nature of the record. However I would consider representing Type and Core as Integers and using a lookup table for the displayed values. The lookup table matches the integer with the name. With such short names it isn't essential but Access does work considerably faster with numbers than it does with strings.

I expect that Period should not be stored at all. The Period of the record most likely can be determined from the date and as such Period is a derived value.

doma23 · May 5, 2011

There is an intrinsic difference between numbers, so I would definitely need another field (let's call it Description).

Thing is that I have around 2500 records, so if I make my table look like:
Identifier------Description-------Value
I would have 2500*40 = 100,000 records
And the "Description" field would be repeated every 40 times, as well as "Identifier" field.

So, if I understood it correctly, I would have:

Identifier-------Desc-------Value
1--------------Weight-------1.25
1--------------Height--------0.85
1--------------Depth--------3.45
........(40 different items-descriptions)...and then again:
2--------------Weight-------1.65
2--------------Height--------0.25
2--------------Depth--------2.45
...and so on...

Does that sounds right?

P.S. I work with confidential data and unfortunately I can't reveal details, otherwise it would be much more easier for me to explain myself.

Galaxiom · May 5, 2011

doma23 said:
There is an intrinsic difference between numbers, so I would definitely need another field (let's call it Description).

Ah, if the fields are totally unrelated then that is different entirely. Usually using names like that suggests denormalization.

For the sake of complete the discussion I will answer these questions anyway. (Least I can do after maybe leading you on a wild goosechase). Tehn I will take another look at the original problem now I have a better understanding.

The technique described is a very good one to know and the best solution to databases where the user needs to add new Description types.

It also supports extremly simple generic querying without the need for dynamic query construction. The required records can be designated with parameters for the Desc field applies to just a single simple query.

eg:

Code:

 SELECT Identifier WHERE Description = Forms![formname].[somecombo] AND TheValue = Forms![formname].[anothercombo]

Thing is that I have around 2500 records, so if I make my table look like:
Identifier------Description-------Value
I would have 2500*40 = 100,000 records
And the "Description" field would be repeated every 40 times, as well as "Identifier" field.

So, if I understood it correctly, I would have:

Identifier-------Desc-------Value
1--------------Weight-------1.25
1--------------Height--------0.85
1--------------Depth--------3.45
........(40 different items-descriptions)...and then again:
2--------------Weight-------1.65
2--------------Height--------0.25
2--------------Depth--------2.45

To reduce the data required for the Description the values are represented by Long Integers. The corresponding Description names are held in another table with fields for the DescriptionID and DescriptionName with a record for each description.

These values are displayed on forms in a subform using Datasheet or Continuous Forms mode. The Desc field is implimented as a combobox designed to display the name but record the number.

Note that 100,000 records is not an issue for Access. The limit is the total size of the database which cannot exceed 2GB. In a well designed table structure this will usually support in excess of ten million records.

Galaxiom · May 5, 2011

I have looked again at the problem and gave it some intense thought. I think the alternative structure actually might be the right one for your problem.

The problem is that the latest value for a particular field may be in any of the other records. Consequently to work through your original structure would require a subquery for each field to determine the latest record with a value in that field prior to the main record of interest when that record did not include a measurement.

For each subquery, the table would be joined to an alias of iteslf on Product and Type.

I started on the code but it is incredibly cumbesome with joins at multiple levels even within the subqueries. I left it in the post to give you an idea of the complexity but I will almost guarantee it won't be right. Besides by the time you do this for each Nos field, Access is probably going to roll over.

Code:

SELECT Product, [Type], [Value]
FROM
   (
    SELECT  Product, [Type], Max(R_Date) As LastDate
    FROM
        (
         SELECT A.Product, A.[Type], B.R_Date
         FROM tablename AS A INNER JOIN Tablename AS B
         ON A.Product = B.Product AND A.[Type] = B.[Type]
         WHERE B.R_Date <= A.R_Date
        )
    AS C
    GROUP BY Product, [Type]
    )
AS D
INNER JOIN tablename AS E
ON D.Product = E.Prduct AND D.[Type] = E.[Type]

I will post again with other ways to tackle it.

Note that Type and Value are both reserved words and should be avoided as names.

Galaxiom · May 5, 2011

Recordsets can do your job without changing the data structure but it would be quite a complex operation.

How much experience do you have with recordsets and VBA loops?

Galaxiom · May 5, 2011

Another way would be to use a Union query to generate the data into the alternative structure.

Code:

SELECT Product, [Type], R_Date, 1 As Nos, [No1] as [Value] FROM tablename
UNION ALL
SELECT Product, [Type], R_Date, 2, [No2] FROM tablename
UNION ALL
etc

(This query can be easily generated in VBA without having to write out a huge query.)

Once you have this, join that query to an alias of itself on Product, [Type] and Nos.

Then it is relatively easy to find the last valid value prior to or equal to the date of the main record.

If you can't manage that bit it would be much easier and more reliable to actually create this query for you than try and do aircode.

Would it be possible to post a cut down database with just the relevant table and a few dummy records to protect the privacy? After trimming the records do a Compact and Repair and then zip the file.

doma23 · May 5, 2011

I have some experience with recordsets and VBA loops. I just don't have the idea where would I start. If I would know how to correctly set up the algorithm I could probably make it.
However, since the tool will be running over the network, I would also have the issue of speed if it would be heavy coding.

I've also thought that it might be solved with SQL unions and with rearranging the existing data into a new query. Once I did something similar.
Once I cut down the database I'll upload it.

Cheers!

doma23 · May 5, 2011

Ok, I've attached the trimmed database.
I left it open as I deleted all other code not needed for this task.

The current version only makes the table with takign the values from the latest record.
Like discussed before it should take the latest values available from all records.

You can see what should be "grouped by" in the QryFilter_step0.

Thanks!

Galaxiom · May 6, 2011

I have had a look at your database and it is quite clear that it really needed a lot more time spent on the original data model. It looks like a big flat file based on a spreadsheet to me.

It is quite impossible to determine which fields would define a unique record so it is not possible to decide which fields need to be in the normalized output (effectively as a composite key) to define the original record. This is essential information for when the Union query is self joined.

If this project has any real future you really do need to entirely reconsider that strucure or development will become an even bigger nightmare.

I have gotten as far as as applying a script that generates the Union query automatically. It is a generic script and would actually be useful to you if you decided to redesign the database structure.

However my script does not tolerate fieldnames with special characters including the ampersand and slash so I removed these rather than modify my script. Special characters and spaces should never be used in object names. It just makes for extra work and I never use them.

Tell me about the fields that end in "Opt". They strongly suggest real denormlization as the value is repeated in every record for any that do have values. Their name suggests they are associated with the previous field in some way.

In particular, can these Opt fields be ignored when finding the latest relevant value in the field for a particular record? I say this because the Union query falls over as "too complex" when I add all the fields.

I have got it producing a a viable union without these. If they are required then the process witll need to be repeated with smaller groups of fields.

doma23 · May 6, 2011

In the qryFitler_Latest you can the relationship when searching for the latest date.
I thought it would indicate what the unique records are. Anyway, the unique record is identified by following fields:
[R_Quarter], [Period], [Odjel], [Drustvo], [Br] and [R_Date]

The project definitely has the future. But, what this Access tool is designed is to have features of inputting data into database by using forms (+searching/viewing data from this database+another one) and having a rather "simple" output feature, which would basically transfer some queries to Excel.

The output feature should be able to filter the data by Date or Ref. Quarter, then take the latest data available (this is what we are trying to do) and then make some averages in another query and export that to excel.
Later, I should also build some basic reporting activities inside the tool itself (but it's not a must for now. This is also where I could run on the same problem again, but it depends what will be required).

Field names and special characters are not a problem, I changed them when I've trimmed the database, they are without ampersands. Some of them have slashes, but it can be changed, and it shouldn't be a very big issue.

Opt are related to the next field. It just tells if the data in the next field are calculated or stated.
Opt can be ignored, it's the text field, and it's not necessary when taking the latest data. Although I don't know why the union query would fail because of them.

Galaxiom · May 6, 2011

doma23 said:
Opt are related to the next field. It just tells if the data in the next field are calculated or stated.

I have never encountered a field that is selectively calculated or stated. One does not normally store a calculated value but I don't quite know what to think of that one.

However if there are only two possible values here, calculated and not calculated, (can there be a third possibility?) I would recommend using a Boolean (Yes/No) field type. These can be processed very quickly compared to text. There is no point to storing a long string because it will slow down processing and increase storage requirements.

On the subject of derived fields I suspect you have some that you shouldn't.
R_Quarter can apparently be calculated from R_Date.

If I understand correctly, an entry for C_Date (completion date?) would indicate Co_Status. If the status can be derived then it should not be stored. Even if there is a variety of possible values for that field they should be stored as a number and the text found in a lookup table.

Opt can be ignored, it's the text field, and it's not necessary when taking the latest data. Although I don't know why the union query would fail because of them.

Just the sheer size of a Union query combining so many fields. Leaving them out just makes it smaller and will also greatly reduce the calculation time.

I really do think this project should be reviewed for potential improvements in the data structure. Sometime projects are irrevocably complicated but without knowing a lot more about the background it is impossible to tell.

doma23 · May 6, 2011

GalaxiomAtHome said:
I have never encountered a field that is selectively calculated or stated. One does not normally store a calculated value but I don't quite know what to think of that one.

Well, what can I say - there is first time for everything.

Just kidding...the field really can be either calculated or stated and there is a strong reason behind it. In this case it's necessary to distinguish and store both values.
The tool itself has about 7-8k lines of code.

GalaxiomAtHome said:
However if there are only two possible values here, calculated and not calculated, (can there be a third possibility?) I would recommend using a Boolean (Yes/No) field type. These can be processed very quickly compared to text. There is no point to storing a long string because it will slow down processing and increase storage requirements.

Yes, you're right, this should be changed.

GalaxiomAtHome said:
On the subject of derived fields I suspect you have some that you shouldn't.
R_Quarter can apparently be calculated from R_Date.

If I understand correctly, an entry for C_Date (completion date?) would indicate Co_Status. If the status can be derived then it should not be stored. Even if there is a variety of possible values for that field they should be stored as a number and the text found in a lookup table.

R_Quarter could probably be calculated from R_Date.
C_Date does not indicate Co_Status.
However, these are really not big issues at the moment. We are working on a fast computers, so these things shouldn't influence processing speed significantly. Also couple of hundred Kb of data more is really not a big deal.

Although, this is definitely something I will be watching over in the future.
For this particular tool, there were a deadlines, and I've had/have limited experience, and I just tried to focus on most important stuff and getting the job done.

GalaxiomAtHome said:
Just the sheer size of a Union query combining so many fields. Leaving them out just makes it smaller and will also greatly reduce the calculation time.

Could you upload the zip file with union query, so I can try to implement it?

And thanks! I really appreciate all the effort.
I knew I was missing knowledge about normalization and data structure and you definitely gave me some ideas and warnings on what to watch and where to be careful.

Combining multiple records into one record (1 Viewer)

Registered User.

AWF VIP

Registered User.

Super Moderator

Registered User.

Super Moderator

Registered User.

Registered User.

Super Moderator

Registered User.

Super Moderator

Super Moderator

Super Moderator

Super Moderator

Registered User.

Registered User.

Attachments

Super Moderator

Registered User.

Super Moderator

Registered User.

Similar threads

Users who are viewing this thread