Split table in 4 equivalent parts by 3 criteria

iordache09 · Dec 19, 2014

Hi all,
I have a table with the folowing structure:
ID(integer) Value1(Double) Value2(Double)

I need to split this table in 4 equals parts:
- count of id must be equal or close to equal(ex: 467 split in 3 x 117 and 116) for each part
- the cumulated Value1 and Value2 must be equal or close to equal for each part

Can u please help with a solution

Thx

plog · Dec 19, 2014

I think to guarantee exactly what you want you could write a 250 line function. Or, you could use this method and be pretty damn close:

If the ID is sequential (1, 2, 3...) and doesn't have gaps you can use the MOD operator (http://msdn.microsoft.com/en-us/library/se0w9esz.aspx) to put records into parts by their ID.

Part: (ID MOD 4) + 1

Use that and it will assign parts as such:

ID's in Part 1: 1, 5, 9, 13, 17...
ID's in Part 2: 2, 6, 10, 14, 18...
ID's in Part 3: 3, 7, 11, 15, 19...
ID's in Part 4: 4, 8, 12, 16, 20...

This guarantees the first criteria (3 parts of 117 and 1 of 116); but not the second (all parts roughly equal). However, over 467 records the values should come to average out.

jdraw · Dec 19, 2014

An off the wall approach:
Create a fourth column in a query/recordset

Id,value1, value2, value1+value2

Sort on descending Value1+value2

then sequentially assign to group1, group2, group3,group4 in a loop

iordache09 · Dec 19, 2014

Thanks for your answer, let me give you more details to help you understand my situation.

My table looks like this, and i have approx. 8000 records (use text to column separator " "):

ID UserID Category Value1 Value2 Group
351520 1801129226 B1 5,140.33 115.61 Group1
1430584 2850323070 B1 6,841.17 221.52 Group2
1412316 1690308070 B1 6,817.92 223.49 Group3
1253380 1780911434 B1 1803.89 109.44 Group1
1337470 2460820090 B2 3,675.36 248.64 Group1
1552594 2591019080 B2 3746.05 269.58 Group3
999728 2850323070 B2 3,700.43 255.50 Group2
586489 2590115511 B2 1,978.62 21.60 Group3
1362556 1549317360 B3 2,197.88 159.87 Group1
1147636 2690202900 B3 2,044.22 204.04 Group2
812250 7759108479 B3 2,163.30 127.90 Group3
1645765 7759108479 B3 1,843.00 190.15 Group1
1419227 2799813159 B3 2,003.27 115.93 Group2
1134840 1469917131 B3 2,011.01 193.99 Group3

I need to split the data into 3 groups (see column Group), the groups must have a equal number of records and a equal cumulated "value 1" and "value 2".

The data must be split taking in consideration the category, the rules mentioned above apply to each category.
Each record belongs to a user ("UserID"), if one record from a user goes to a group, the others must go to the same group, without changing the cumulated "Value 1" and "Value 2"

And in the future I might need to change the percentage for each group (40, 30, 30 or other combinations).

I currently use a set of queries to select randomly the first 40% values for each category and a union to put them together, after this i use a query to select the rest of 60% and split the records in 50% - 50%, see the code below:

SELECT DISTINCT Table.*
FROM
(SELECT TOP 40 PERCENT Table.UserID, Table.ID, Table.Category
FROM Table
ORDER BY rnd(INT(NOW*ID)-NOW*ID)) AS RandomRecords
LEFT JOIN Table ON RandomRecords.UserID = Table.UserID
WHERE ((Table.Category)="B2")

UNION ALL

SELECT DISTINCT Table.*
FROM
(SELECT TOP 40 PERCENT Table.UserID, Table.ID, Table.Category
FROM Table
ORDER BY rnd(INT(NOW*ID)-NOW*ID)) AS RandomRecords
LEFT JOIN Table ON RandomRecords.UserID = Table.UserID
WHERE ((Table.Category)="B3")

UNION ALL
. B4
. B5
...
. B12

Unfortunatelly there are categories with few records, so the cumulated "value 1" and "value 2" is not equal (not even close) between groups.

I'am open to sql or vba solutions.

Any help will be much appreciated.
Thx

Frothingslosh · Dec 19, 2014

So, in English, your requirements are the following:

The items must be split into 4 groups
All items with a given UserID must be in the same group at all times
- Items per UserID can be arbitrarily large or small in quantity
Each group must have the same number of records (but see point 6)
Each group must have the same total value
- The value for each item can be arbitrarily large or small
If items are moved from one group to another, other items must be moved in such a way that 1-4 still apply.
The code needs to be able to re-allocate items to conform with any arbitrary split that may arise, not just 25%/25%/25%/25%

Good luck with that. FSM knows *I* don't have a clue how I'd approach this one right offhand in the free time I have available.

jdraw · Dec 19, 2014

plog and I gave you 2 different approaches. Did you try either/both? Your last post suggests you are now giving more details for the issue.

Try the approaches that were suggested and come back with focused questions.

Note: Just saw that frothingslosh has responded. I don't think we should have to guess at what the OP wants. I think it's time for him/her to describe the issue/opportunity sufficiently well for a reader to get context.

When readers respond with options, the OP should at least attempt a solution and post specifics if there are issues/questions.

iordache09 · Dec 19, 2014

1. MOD approach:

I tried that already in Excel and it didn't solve my problem because Value1 and Value2 were very different between groups.

2. Value1+Value2 approach

Value1 is considerably larger then Value2

Frothingslosh, that is a very accurate description of my problem, i know it is very complex :banghead:

Frothingslosh · Dec 19, 2014

I can tell you that as a starting point, you're going to want to treat all items assigned to each UserID as a single item for purposes of your sorting, due to your requirement that they remain together at all times.

iordache09 · Dec 19, 2014

Yes, but how can I use this method and keep the number of records equal across all groups (in each category), considering the fact that a user can have records in different categories.

Cumulating the values at UserID level eliminates the posibility of having equal groups at category level.

Frothingslosh · Dec 19, 2014

That was why I said 'starting point' and 'treat as'.

This is honestly the kind of thing that requires a full-time developer, I'm afraid, not people answering questions during lulls at work or in a free hour after work.

plog · Dec 19, 2014

FSM knows *I* don't have a clue...

Flying Spaghetti Monster? If so, consider that saying stolen, if not I claim copyright.

I think I had it right when I say you are going to need a 250 line function to get this exactly like you want. Most likely there won't be a perfect permutation for any given set of data, but since you've laid out logical rules, there will always be a desired permutation that is better than all the others for that data set.

I really think you need to decide how close you need to be to that desired set. The closer you want to be, the more difficult it is going to be to achieve. I think you can find a top 10% permutation with a some work, but closing that to a top 5% permutation is going to take twice as much work, 2.5% will take twice as much as that and so on.

This is a diminishing returns problem, you really should decide how good is good enough.

Frothingslosh · Dec 19, 2014

Yeah, Flying Spaghetti Monster. I've been considering taking up Pastafarianism, if only because it'll piss off a couple folks in the Watercooler.

Anyway, I've been turning his problem around in my head, and it's honestly the record-count issue that's getting in my way. Without that, I can definitely see a reasonably straightforward approach to do this.

Split table in 4 equivalent parts by 3 criteria

iordache09

New member

plog

Banishment Pending

jdraw

Super Moderator

iordache09

New member

Frothingslosh

Premier Pale Stale Ale

jdraw

Super Moderator

iordache09

New member

Frothingslosh

Premier Pale Stale Ale

iordache09

New member

Frothingslosh

Premier Pale Stale Ale

plog

Banishment Pending

Frothingslosh

Premier Pale Stale Ale

Similar threads

Users who are viewing this thread