Split table in 4 equivalent parts by 3 criteria

iordache09

New member
Local time
Today, 12:05
Joined
Dec 19, 2014
Messages
4
Hi all,
I have a table with the folowing structure:
ID(integer) Value1(Double) Value2(Double)


I need to split this table in 4 equals parts:
- count of id must be equal or close to equal(ex: 467 split in 3 x 117 and 116) for each part
- the cumulated Value1 and Value2 must be equal or close to equal for each part

Can u please help with a solution

Thx
 
I think to guarantee exactly what you want you could write a 250 line function. Or, you could use this method and be pretty damn close:

If the ID is sequential (1, 2, 3...) and doesn't have gaps you can use the MOD operator (http://msdn.microsoft.com/en-us/library/se0w9esz.aspx) to put records into parts by their ID.

Part: (ID MOD 4) + 1

Use that and it will assign parts as such:

ID's in Part 1: 1, 5, 9, 13, 17...
ID's in Part 2: 2, 6, 10, 14, 18...
ID's in Part 3: 3, 7, 11, 15, 19...
ID's in Part 4: 4, 8, 12, 16, 20...

This guarantees the first criteria (3 parts of 117 and 1 of 116); but not the second (all parts roughly equal). However, over 467 records the values should come to average out.
 
An off the wall approach:
Create a fourth column in a query/recordset

Id,value1, value2, value1+value2

Sort on descending Value1+value2

then sequentially assign to group1, group2, group3,group4 in a loop
 
Thanks for your answer, let me give you more details to help you understand my situation.

My table looks like this, and i have approx. 8000 records (use text to column separator " "):

ID UserID Category Value1 Value2 Group
351520 1801129226 B1 5,140.33 115.61 Group1
1430584 2850323070 B1 6,841.17 221.52 Group2
1412316 1690308070 B1 6,817.92 223.49 Group3
1253380 1780911434 B1 1803.89 109.44 Group1
1337470 2460820090 B2 3,675.36 248.64 Group1
1552594 2591019080 B2 3746.05 269.58 Group3
999728 2850323070 B2 3,700.43 255.50 Group2
586489 2590115511 B2 1,978.62 21.60 Group3
1362556 1549317360 B3 2,197.88 159.87 Group1
1147636 2690202900 B3 2,044.22 204.04 Group2
812250 7759108479 B3 2,163.30 127.90 Group3
1645765 7759108479 B3 1,843.00 190.15 Group1
1419227 2799813159 B3 2,003.27 115.93 Group2
1134840 1469917131 B3 2,011.01 193.99 Group3

I need to split the data into 3 groups (see column Group), the groups must have a equal number of records and a equal cumulated "value 1" and "value 2".

The data must be split taking in consideration the category, the rules mentioned above apply to each category.
Each record belongs to a user ("UserID"), if one record from a user goes to a group, the others must go to the same group, without changing the cumulated "Value 1" and "Value 2"

And in the future I might need to change the percentage for each group (40, 30, 30 or other combinations).

I currently use a set of queries to select randomly the first 40% values for each category and a union to put them together, after this i use a query to select the rest of 60% and split the records in 50% - 50%, see the code below:

SELECT DISTINCT Table.*
FROM
(SELECT TOP 40 PERCENT Table.UserID, Table.ID, Table.Category
FROM Table
ORDER BY rnd(INT(NOW*ID)-NOW*ID)) AS RandomRecords
LEFT JOIN Table ON RandomRecords.UserID = Table.UserID
WHERE ((Table.Category)="B2")

UNION ALL

SELECT DISTINCT Table.*
FROM
(SELECT TOP 40 PERCENT Table.UserID, Table.ID, Table.Category
FROM Table
ORDER BY rnd(INT(NOW*ID)-NOW*ID)) AS RandomRecords
LEFT JOIN Table ON RandomRecords.UserID = Table.UserID
WHERE ((Table.Category)="B3")

UNION ALL
. B4
. B5
...
. B12

Unfortunatelly there are categories with few records, so the cumulated "value 1" and "value 2" is not equal (not even close) between groups.

I'am open to sql or vba solutions.

Any help will be much appreciated.
Thx
 
So, in English, your requirements are the following:
  1. The items must be split into 4 groups
  2. All items with a given UserID must be in the same group at all times
    • Items per UserID can be arbitrarily large or small in quantity
  3. Each group must have the same number of records (but see point 6)
  4. Each group must have the same total value
    • The value for each item can be arbitrarily large or small
  5. If items are moved from one group to another, other items must be moved in such a way that 1-4 still apply.
  6. The code needs to be able to re-allocate items to conform with any arbitrary split that may arise, not just 25%/25%/25%/25%
Good luck with that. FSM knows *I* don't have a clue how I'd approach this one right offhand in the free time I have available.
 
Last edited:
plog and I gave you 2 different approaches. Did you try either/both? Your last post suggests you are now giving more details for the issue.

Try the approaches that were suggested and come back with focused questions.

Note: Just saw that frothingslosh has responded. I don't think we should have to guess at what the OP wants. I think it's time for him/her to describe the issue/opportunity sufficiently well for a reader to get context.

When readers respond with options, the OP should at least attempt a solution and post specifics if there are issues/questions.
 
Last edited:
1. MOD approach:

I tried that already in Excel and it didn't solve my problem because Value1 and Value2 were very different between groups.

2. Value1+Value2 approach

Value1 is considerably larger then Value2


Frothingslosh, that is a very accurate description of my problem, i know it is very complex :banghead:
 
I can tell you that as a starting point, you're going to want to treat all items assigned to each UserID as a single item for purposes of your sorting, due to your requirement that they remain together at all times.
 
Yes, but how can I use this method and keep the number of records equal across all groups (in each category), considering the fact that a user can have records in different categories.

Cumulating the values at UserID level eliminates the posibility of having equal groups at category level.
 
That was why I said 'starting point' and 'treat as'.

This is honestly the kind of thing that requires a full-time developer, I'm afraid, not people answering questions during lulls at work or in a free hour after work.
 
FSM knows *I* don't have a clue...

Flying Spaghetti Monster? If so, consider that saying stolen, if not I claim copyright.

I think I had it right when I say you are going to need a 250 line function to get this exactly like you want. Most likely there won't be a perfect permutation for any given set of data, but since you've laid out logical rules, there will always be a desired permutation that is better than all the others for that data set.

I really think you need to decide how close you need to be to that desired set. The closer you want to be, the more difficult it is going to be to achieve. I think you can find a top 10% permutation with a some work, but closing that to a top 5% permutation is going to take twice as much work, 2.5% will take twice as much as that and so on.

This is a diminishing returns problem, you really should decide how good is good enough.
 
Yeah, Flying Spaghetti Monster. I've been considering taking up Pastafarianism, if only because it'll piss off a couple folks in the Watercooler. ;)

Anyway, I've been turning his problem around in my head, and it's honestly the record-count issue that's getting in my way. Without that, I can definitely see a reasonably straightforward approach to do this.
 
Last edited:

Users who are viewing this thread

Back
Top Bottom