Combine successive records in query based on common field value

KyleB · Jun 28, 2012

I am currently attempting to import a CSV data file that lists information in a top down format that I need to convert to a compacted flat file form for further calculations and manipulation, but cannot figure out how to to do so. I believe I'm missing something basic in how I'm doing my select query, or I'm trying to make it do something it's not intended to do.

Examples attached as DumpedData.jpg before, and ConvertedData.jpg after conversion.

The common key is based on [Layout] & [Ring] & [Hole], with [Point Number] being the variable to combine on. And I'm pulling the X,Y, and Z from the second entry and appending it into the first record string, so I have a coordinate start and stop point by which I can generate calculations. Basically there should always be two lines in the data set with the first three fields the same, and the last field being two sequential numbers, though that's not always guaranteed, but one will always be larger than the other for sure. There are some entries in the data set where the first three fields will be the same for 25 records in a row, but that's something I can deal with on a separate select distinct query.

I've found concatenation routines on here, but that won't work for me as I need to maintain numeric integrity, in individual fields, and data separation in order to do calculations.

Any suggestions to point me in the right direction? Am I looking at some do/while loops in VBA, or is this something that I could somehow extract the leading half of the data with one query, the trailing half with a second query, and then union the final results back together in the order I need?

mdlueck · Jun 28, 2012

KyleB said:
... or is this something that I could somehow extract the leading half of the data with one query, the trailing half with a second query, and then union the final results back together in the order I need?

That is something SQL is certainly capable of doing... not certain Access's implementation / flavor of SQL or not. But yes, you may when you are selecting a column perform substring operations and only return said substring. So select the unique ID(s) to identify the record, and select your substring(s), and be able to perform comparisons.

vbaInet · Jun 29, 2012

You're showing an incorrect representation of your raw data. In row 1 of ConvertedData.jpg the co-ordinates tx to tz relate to Point Number 40, but you've put them under 39. What's the reasoning behind this? And why are you de-normalising?

KyleB · Jul 2, 2012

That is not an incorrect representation though. Point 39, and point 40 represent spatial coordinates on the state plane survey system for Alaska Zone 1, and show the beginning and the end coordinates of a line segment. I'm attempting to coalesce the coordinates from the two lines within a query in order to do calculations, apply further data classification, and then store the data in a normalized form with proper identifiers supplied via parameter queries. I'm not de-normalizing, the data stays normalized, I'm simply trying to do some mathematical manipulation, but cannot get the associated data onto a single line within a query, in order to do the needed calculations.

The data is dumped out of a program which I have no control over however, and it dumps the coordinates in the order that an object is drawn. What I'm attempting to do is take the lines that go together (as they will have only two points per line) and merge them to a single line for the purposes of calculations.

From the coordinates I can calculate line length, bearing, dip, and thereby use that information for the purpose of more detailed reports. With the data on successive lines, and no easily discernible connection between the line data, the risk of attempting to store them in this format is fraught with the danger of disassociation. I need to take the raw data, and manipulate it at the point in time it's dumped so that the individual making the report can fill in the parameter values to uniquely identify the data for storage. But part of the process is to create formulas and queries to determine the above mentioned results from two sets of coordinates that are not on the same line of a table, and do not have a common identifier to use to pull out via a query.

vbaInet · Jul 3, 2012

Why don't we start with the calculations and see how we can achieve that without needing to "merge" the rows.

KyleB · Jul 4, 2012

Using the data in the jpg, I will need to start by doing my calculations between line one and line two. Then between three and four to get the next set of results, etc. Previously I've always manipulated data on a line by line basis, much like an awk script would do. What I'm attempting to do here is a more complex do/while loop via VBA I think, though I thought it would be simpler if I could use a select query to pull the data out of the table, and put it into one line, which is the question I was attempting to ask. Most data is composed of two point pairs, but the possibility for more than two points in a line does exist, hence the need to look at the "hole" column.

What I'll need to calculate in the first stage is the length of the line. Found with this: square root of the difference between the x values squared, plus the difference between the y values squared, plus the difference between the z values squared. (distance = sqrt[(z2 - z1)^2 + (x2 - x1)^2 + (y2 - y1)^2]). People being what they are, I should probably do the absolute value of the difference between the points in case someone ham-hands it and draws a line backwards.

Second calculation: Azimuth, found with the arctan of the difference in x divided by the difference in y. (Azimuth1 = tan^-1[(x2-x1)/(y2-y1)]). This gives the result in Radians, so convert to degrees: (Azimuth2 = Azimuth1*180/3.14159)

Third calculation: Dip, as the degree variance above/below horizontal. Found similar to the azimuth. Arctan of the difference in the z divided by the line length. (Azimuth1 = tan^-1[(z1-z2)/(distance)]) Then convert to degrees: Azimuth2 = Azimuth1*180/3.14159

Fourth calculation: Dip direction, as east or west. if azimuth < 180, then east, else west.

the calculations I can generate, but only if i can find a way to reference the correct x, y, and z values that correspond to the beginning or end of a line, and when the requisite pieces are not on the same line of data, I'm outside of my depth.

vbaInet · Jul 9, 2012

Ok, I think I get a clearer picture. Knowing what your calculations are helps for these types of problems. Am I right to assume that there can only be a maximum of two lines per hole? And that your calculations are on a per hole basis?

Or you're just getting the difference of the line above from the line below regardless of the hole?

What's the approximate number of rows of records would you be performing the calculation on?

Sorry only just getting back.

KyleB · Jul 9, 2012

There is the possibility of having more than two points per hole, though they are usually only two.

Calculations are all on a per hole basis, for this particular step, though there are subsequent groups, and summaries that would be generated downstream in the reporting step. Points will always be dumped sequentially, as that is how the program exports data, so it should be a calculation between two concurrent lines, or first and last line for a named hole. An odd number of points in a line could throw calculations off if not accounted for. Hole data could be looked at as only the first and last row entry for that named object, ignoring intermediate points for the purpose of these evaluations. (the lines are all straight, so intermediate points can be ignored without giving erroneous results) Edit: The points that make up a line are always sequential, so if you have point 38, 39, and 40 and they're all Hole #6 then you can ignore 39, and look at point 38 and 40

The calculations would be on upwards of 500 points at a time, so 1000 lines of data per job generally speaking.

Thanks for the response, the delay happens to me too.

vbaInet · Jul 10, 2012

Sounds like very manageable data considering the approach that I have in mind at the moment. Because you want to use a query, the first thing to do is to find a way of creating a running count per hole, but this isn't easily achieved in a query.

So you have the following options:

1. Get a running sum per hole, and if the number is even use a subquery to get the previous row's value and do your calculations.

2. Use a report to get the running sum per hole (which is very easy) and perform your calculations based on the logic explained in 1.

3. Use a recordset to perform the calculation and save the values to a temporary table or a table you will use solely for saving the calculations.

I'm in favour of options 2 or 3. Which one appeals to you?

KyleB · Jul 11, 2012

Well, the problem with the running sum is that it fails if there's ever more than two points to a hole, and that will almost always occur for at least one entry in each job. and that could be either an odd, or an even number of points. Hence my preference to key off the combination of Ring+Hole+Point Number columns to make a unique id for each line. Then do my calcs between the first sequential "Point Number", and the last sequential point number for a given "Hole".

My preference would be to use a recordset, and a temporary table. I've done this sort of thing before, based on flat files for very complex, and multi-tiered calculations, but I'm still in a quandary about how to re-distribute the data from two+ lines per entity, to a single line for the purposes of calculations.

vbaInet · Jul 11, 2012

Sorry I meant Running Count, not Running Sum. So it will look like this:

Code:

RunningCount  |  Hole
1                39
2                39
3                39   
1                40
2                40

From that we can get the previous value by performing calulations only on the even RunningCount numbers. Makes sense?

KyleB · Jul 11, 2012

Actually no, that doesn't make sense to me. Allowing that I might be misunderstanding, if I used your example above, you'd be having me do a calculation between the number 2's, as in between hole 39 and hole 40. But they're not correlated data, they're from two separate holes. I'm actually trying to do my calculations between running count 1 and 3 for hole 39, and between 1 and 2 for hole 40. The calculations I'm looking at are values per hole only. I will sum later, but right now I just want to know the data for each individual hole, regardless of the number of points in that hole.

For clarification, a "hole" as I'm talking about can be thought of simply as a straight line from Point A to Point B, with potential extra points in between, though the line bearing does not change, and the intermediate points are superfluous.

I've attached a screen shot example of some of the data I'm working with. This is for Ring 04, Holes 1-11, and each hole is labeled with green numbers to indicate the point sequence as they would dump out to the CSV file. So, for instance, on Ring 04, Hole 6, the CSV file would have three lines of coordinates, listing the x,y,z values of point 355, 356, and 357. I would do my calculations between point 355, and 357. And my unique Id in plain text format would be:
Ring04Hole6PointNumber355
Ring04Hole6PointNumber356
Ring04Hole6PointNumber357
And I would ignore #356 for calculation purposes, as it is irrelevant for my needs, I just need to figure out the length, bearing, and dip of the hole, and so I only need the beginning, and end points for that.

Edit:fixed jpg for viewing clarity

vbaInet · Jul 11, 2012

That's what I'm implying. For each hole, the calculation will be done on only the even numbers. Can you now see the benefit of the RunningCount?

KyleB · Jul 11, 2012

I'm sorry, but that still doesn't make sense to me. if I used the even count number on a data set such as I appended then the calculation would be based off point number 356. However, that point is in the middle of the line, and if I attempted to do a distance calculation between the beginning point and that point, then my resultant value would be only half the true length of the line. I would need to do the calculation between the first and last point number to get the line length, and those are both odd values (355 and 357), and if I used a running count they would still both be odd values (1 and 3).

vbaInet · Jul 11, 2012

Oh sorry, I missed that fact that you said you will ignore 356.

In that case, here's what I think can be done:

1. Copy the entire table and paste into a new temp table with the RunningCount column
2. Create a query that will return the Min of the RunningCount column per hole
3. Create a query that will return the Max of the RunningCount column per hole
4. Join queries 2 and 3 using a UNION.
5. Perform your calculations on query in 4.

How does that sound?

KyleB · Jul 11, 2012

Hmm, yes that does sound viable, and was what I was working towards, but unable to determine a method for doing. Let me fiddle around with it and see what I can come up with. Thank you.

vbaInet · Jul 12, 2012

Let us know how you get on.

KyleB · Jul 12, 2012

That got me pointed in the right direction, thank you. I didn't end up using a union query, but otherwise matched what you suggested. making using of the running count, I was able to split my temporary data table into two query's, and append the two back into another temporary table (I had an updateable query issue that wouldn't let me combine them unless I put it into another table first, due to the use of min/max and right outside join problems) to continue with my calculations, and it seems to be 95% correctly working now, I just need to clean up some formulas, and proceed from there.

vbaInet · Jul 13, 2012

Pleased to hear you have something working.

The process I explained is actually how it should have been coded. So what I meant by "copy and paste" is that the numbering should be done in a query or using a recordset and save this new dataset into a new table.

I'm not sure why you needed two other temp tables though. I think it's redundant if you can use a UNION to merge them. Access doesn't do very well with temp tables so one should aim to use the least number of temp tables as possible.

Good luck with the rest of your project and let us know if you need further help.

Combine successive records in query based on common field value

Registered User.

Attachments

Sr. Application Developer

AWF VIP

Registered User.

AWF VIP

Registered User.

Attachments

AWF VIP

Registered User.

AWF VIP

Registered User.

AWF VIP

Registered User.

Attachments

AWF VIP

Registered User.

AWF VIP

Registered User.

AWF VIP

Registered User.

AWF VIP

Similar threads

Users who are viewing this thread