Solved Build departement query (1 Viewer)

namliam

The Mailman - AWF VIP
Local time
Today, 12:58
Joined
Aug 11, 2003
Messages
11,696
So i have a table

Code:
ID    BATCH  DEPARTMENT   START       END
5     X      1            01-4-2020   02-4-2020
9     X      3            02-4-2020   05-4-2020
15    X      4            05-4-2020   07-4-2020
28    X      4            07-4-2020   11-4-2020
44    X      3            11-4-2020   15-4-2020
6     X      3            15-4-2020   18-4-2020
66    X      3            18-4-2020   22-4-2020
99    X      1            22-4-2020   25-4-2020

In actually I only have the start dates (and times) but I am able to transpose the end date(times) without much trouble.
What I am trying to reach is this:
Code:
BATCH  DEPARTMENT   START       END
X      1            01-4-2020   02-4-2020
X      3            02-4-2020   05-4-2020
X      4            05-4-2020   11-4-2020
X      3            11-4-2020   22-4-2020
X      1            22-4-2020   25-4-2020

Offcourse there are different batches and the number of lines per batch or per department are random as well as the order.
But I am trying to find the back to back time periods of the same period and make them as one.
Thus making from department 3, two records 2-4 thru 5-4 and 11-4 thru 22-4.


Cross posted to the access forum since it gets more readers and the problem seems universal enough for me to transpose the solution.
Cross post is here: https://www.access-programmers.co.uk/forums/threads/build-departement-query.310926/
However this problem is residing in a SQL server database, so I hope maybe there is a specific SQL server solution to this?



Anyone have a good solution to (dynamicaly) do this?
 

CJ_London

Super Moderator
Staff member
Local time
Today, 11:58
Joined
Feb 19, 2013
Messages
16,553

namliam

The Mailman - AWF VIP
Local time
Today, 12:58
Joined
Aug 11, 2003
Messages
11,696
I have considered making a udf, they are expensive when run from queries considering the mass of data this needs to run on I have just about dismissed that option.

I considered writing a dozen or so joins even 50 to make make sure and just writing a big ass Case statement to weed thru the joins. But there just has to be a better way of doing this.

I tried using Lead(end) over (partition by Batch, Department order by start) which works fast, but doesnt work properly :(

Recursive to me seems to be more suited for a tree like problem, this isnt particularly such a tree.... more of a single branch you want to walk down

still hoping for new ideas :)
 

CJ_London

Super Moderator
Staff member
Local time
Today, 11:58
Joined
Feb 19, 2013
Messages
16,553
there was a thread a few weeks ago when the OP wanted to summarise records as you do. However the difference there was records were contiguous, i.e. without breaks as you have for dept 3.

The principle was to flag each record using a subquery to look at the 'next' record - if the start date did not match the end date it was flagged as the last in the sequence. It was easy to flag the first in the sequence (min start date). this could then be summarised in another query, but as I said, there was only one min - your dept 3 has 2. But I guess you could use another subquery to look at the previous record which has the same end date as the start date - and if null then flag as start.

Don't have the time right now to draw up an example but something to work on
 

CJ_London

Super Moderator
Staff member
Local time
Today, 11:58
Joined
Feb 19, 2013
Messages
16,553
OK got it - based on your example this works


SQL:
SELECT R.Batch,
    R.Dept,
    R.StartDT,
    (SELECT TOP 1 EndDT FROM (SELECT P.*
                                                   FROM tbl AS P LEFT JOIN tbl AS E ON P.EndDT = E.StartDT AND P.Dept = E.Dept AND P.Batch = E.Batch
                                                   WHERE E.ID Is Null)
     WHERE Batch=R.Batch AND Dept=R.Dept AND EndDT>=R.StartDT ORDER By EndDT) AS SQEndDT

FROM tbl AS R LEFT JOIN tbl AS S ON (R.Batch = S.Batch) AND (R.Dept = S.Dept) AND (R.StartDT = S.EndDT)

WHERE S.ID Is Null

result
Query1

BatchDeptStartDTSQEndDT
x
1​
01/04/2020​
02/04/2020​
x
3​
02/04/2020​
05/04/2020​
x
4​
05/04/2020​
11/04/2020​
x
3​
11/04/2020​
22/04/2020​
x
1​
22/04/2020​
25/04/2020​
 

bastanu

AWF VIP
Local time
Today, 04:58
Joined
Apr 13, 2010
Messages
1,401
Have a look, I posted it to your other thread:
Cheers,
Vlad
 

Attachments

  • Database1.zip
    20.5 KB · Views: 333

namliam

The Mailman - AWF VIP
Local time
Today, 12:58
Joined
Aug 11, 2003
Messages
11,696
@bastanu Vlad

Thank you for trying, it is close, but no sigare.
In reality the sequence extents en repeats even more..
adding:
Code:
7     X      3            25-4-2020   28-4-2020
77    X      3            28-4-2020   01-5-2020
Breaks your solution :(

@CJ_London
Its saturday seems like I worked too many hours this week already.... Had a hard time reading that SQL.
Seems like it might work, though I strongly dislike inline subselects this may just work and looks way more compact vs using partition by.

May have to reconsider my dislike for inline subselects :), dont think I will get around to testing it today but will let you know when I do.
Thank you for taking time to look at this.
 
Last edited:

CJ_London

Super Moderator
Staff member
Local time
Today, 11:58
Joined
Feb 19, 2013
Messages
16,553
I prefer to not use subqueries myself but sometimes they are unavoidable. Since you are using tsql, it may be there is an alternative - I'm thinking cross apply but not in a position to try it.

you can pull this bit out into a separate query if you wanted

(SELECT P.*
FROM tbl AS P LEFT JOIN tbl AS E ON P.EndDT = E.StartDT AND P.Dept = E.Dept AND P.Batch = E.Batch
WHERE E.ID Is Null)
 

CJ_London

Super Moderator
Staff member
Local time
Today, 11:58
Joined
Feb 19, 2013
Messages
16,553
from the results of your example data, the dates are contiguous - if this reflects reality, the other way to do it is to look for records where start date = 'current' end date but the product and/or department have changed
 

bastanu

AWF VIP
Local time
Today, 04:58
Joined
Apr 13, 2010
Messages
1,401
Sorry, but not sure I get it now,are you saying these two records should be combined into a single one even if there is a gap?
 

namliam

The Mailman - AWF VIP
Local time
Today, 12:58
Joined
Aug 11, 2003
Messages
11,696
@bastanu
Fixed the dates, so yeah there should be 2 groups of department 3 seperated by departement 1.
 

namliam

The Mailman - AWF VIP
Local time
Today, 12:58
Joined
Aug 11, 2003
Messages
11,696
@CJ_London
Code:
ID    BATCH  DEPARTMENT   START
5     X      1            01-4-2020
9     X      3            02-4-2020
15    X      4            05-4-2020
28    X      4            07-4-2020
44    X      3            11-4-2020
6     X      3            15-4-2020
66    X      3            18-4-2020
99    X      1            22-4-2020
7     X      3            25-4-2020
77    X      3            28-4-2020
This is my actual souce data with more or less random times of the day added to the dates. I use Lead(start,1) over (partition by BATCH order by START) to get the end dates, so yeah the end dates always = the next start date, essentially they are one and the same.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 11:58
Joined
Feb 19, 2013
Messages
16,553
in that case perhaps you could do your 'summary' at that point (partition over batch and department)

note since you are creating your End field, End is a reserved word and should not be used as a field name
 

namliam

The Mailman - AWF VIP
Local time
Today, 12:58
Joined
Aug 11, 2003
Messages
11,696
@CJ_London
I had to rewrite your query to
Code:
SELECT R.Batch,
    R.Dept,
    R.StartDT,
    (SELECT max (EndDT) FROM (SELECT P.*
                              FROM tbl AS P LEFT JOIN tbl AS E ON P.EndDT = E.StartDT AND P.Dept = E.Dept AND P.Batch = E.Batch
                              WHERE E.ID Is Null) as x
     WHERE Batch=R.Batch AND Dept=R.Dept AND EndDT>=R.StartDT) AS SQEndDT
FROM tbl AS R LEFT JOIN tbl AS S ON (R.Batch = S.Batch) AND (R.Dept = S.Dept) AND (R.StartDT = S.EndDT)
WHERE S.ID Is Null
since sql server doesnt like order by clauses in subselect or inline select.

Unfortunately my result returns
Code:
BATCH  DEPARTMENT   START       END
X      1            01-4-2020   [B][U]25-4-2020[/U][/B]
X      3            02-4-2020   [B][U]22-4-2020[/U][/B]
X      4            05-4-2020   11-4-2020
X      3            11-4-2020   22-4-2020
X      1            22-4-2020   25-4-2020
[/code
 

namliam

The Mailman - AWF VIP
Local time
Today, 12:58
Joined
Aug 11, 2003
Messages
11,696
in that case perhaps you could do your 'summary' at that point (partition over batch and department)

note since you are creating your End field, End is a reserved word and should not be used as a field name
Column names where thrown online, are not actual column names, should have thought a bit more about them.

Partition over batch and department I tried, it breaks in the same way both your codes do... "joining" the seperated periods of Department 3.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 11:58
Joined
Feb 19, 2013
Messages
16,553
no - won't work with max (or min). see this link for some examples and workarounds for using order by in sub queries

it breaks in the same way both your codes do.
my example works in Access!

as stated before, this code can be a separate query or temporary table (it is the list of end dates)
(SELECT P.*
FROM tbl AS P LEFT JOIN tbl AS E ON P.EndDT = E.StartDT AND P.Dept = E.Dept AND P.Batch = E.Batch
WHERE E.ID Is Null)

would have thought you can use row number to get the first row
 

CJ_London

Super Moderator
Staff member
Local time
Today, 11:58
Joined
Feb 19, 2013
Messages
16,553
just a quick question about your post#12 - what happens to the end date for the last record e.g. ID 77 - where does it get the end date of first May? Is 77 not the last record? Or perhaps you ignore the last record? Assuming this is the case, this sql generates the final result you require, just from the start date

SELECT X.Batch, X.Dept, Min(X.StartDT) AS MinOfStartDT, X.qEnd
FROM (SELECT R.Batch, R.Dept, R.StartDT, Min(S.StartDT) AS qEnd
FROM tbl AS R INNER JOIN tbl AS S ON ((R.Dept <> S.Dept) OR (R.Batch <> S.Batch)) AND (S.StartDT>[r].[startdt])
GROUP BY R.Batch, R.Dept, R.StartDT) AS X
GROUP BY X.Batch, X.Dept, X.qEnd
ORDER BY Min(X.StartDT)

Query2

BatchDeptMinOfStartDTqEnd
x
1​
01/04/2020​
02/04/2020​
x
3​
02/04/2020​
05/04/2020​
x
4​
05/04/2020​
11/04/2020​
x
3​
11/04/2020​
22/04/2020​
x
1​
22/04/2020​
25/04/2020​

I had to add a dummy record same, batch, different dept with a start date of 25/4/20 in order that the last dept1 record would appear - and this does not use ordering except as a final step - you may not need it

Note the non standard joins. You could use a cartesian query and put the joins in as criteria but my guess is that would be slower for large data volumes - but here it is anyway

SELECT X.Batch, X.Dept, Min(X.StartDT) AS MinOfStartDT, X.qEnd
FROM (SELECT R.Batch, R.Dept, R.StartDT, Min(S.StartDT) AS qEnd FROM tbl AS R, tbl AS S WHERE (((R.Dept)<>S.Dept) And ((S.StartDT)>r.startdt)) Or (((S.StartDT)>r.startdt) And ((R.Batch)<>S.Batch)) GROUP BY R.Batch, R.Dept, R.StartDT) AS X
GROUP BY X.Batch, X.Dept, X.qEnd
ORDER BY Min(X.StartDT)
 

namliam

The Mailman - AWF VIP
Local time
Today, 12:58
Joined
Aug 11, 2003
Messages
11,696
Good catch about record 77, it does indead get its end date from a meta record of Batch X.
Batch X has both a Startdate (= startdate on ID 5) and Enddate (which doesnt exist in this recordset).
This enddate eventually is the end of the batch thus the enddate of the last record (77)
Without the meta record, yes you need an additional record to get the enddate.


Bang! Eureka! There you go, I was stuck in conventional mode....
Your non equi join in your last post made me think in reverse....
Find those records where a new department starts, i.e. not next record = this record but Previous record <> this record.
Offcourse this solution is SQL server only solution and doesnt work in Access.

Plus I was trying to make things simpler by ommiting the META record and starting from the fabricated ENDDT on my records. Which ultimately was self defeating... For anyone that cares, this is my solution :)
Code:
With tbl_src as (
select 5 ID  ,'X' BATCH,      1 DEPT , cast('2020-04-01' as date) STARTDT union all
select 9     ,'X',      3            , cast('2020-04-02' as date) union all
select 15    ,'X',      4            , cast('2020-04-03' as date) union all
select 28    ,'X',      4            , cast('2020-04-04' as date) union all
select 44    ,'X',      3            , cast('2020-04-05' as date) union all
select 6     ,'X',      3            , cast('2020-04-06' as date) union all
select 66    ,'X',      3            , cast('2020-04-07' as date) union all
select 99    ,'X',      1            , cast('2020-04-08' as date) union all
select 7     ,'X',      3            , cast('2020-04-09' as date) union all
select 77    ,'X',      3            , cast('2020-04-11' as date)
)
, META as (
Select 'X' MBATCH, cast('2020-04-01' as date) MSTARTDT , cast('2020-04-15' as date) MSTOPDT
)
, TBL as (
select tbl_src.*
     , case when DEPT <> coalesce( lag(DEPT,1) over (partition by BATCH order by STARTDT), -1) then 1
                    else 0 end  as NEWDEPT
     , coalesce(Lead(STARTDT,1) over (partition by BATCH order by STARTDT), META.MSTOPDT) ENDDT
        , ROW_NUMBER()    over (partition by BATCH order by STARTDT) RIJNR
from tbl_src
JOIN META   ON tbl_src.BATCH = META.MBATCH
)
Select TBL.*
     , coalesce(Lead(STARTDT,1) over (partition by BATCH order by STARTDT), MSTOPDT) DEPT_ENDDT
from        TBL  
join   META on TBL.BATCH = META.MBATCH
where  NEWDEPT = 1

Which returns me:
Code:
ID     BATCH  DEPT STARTDT    NEWDEPT  ENDDT       RIJNR    DEPT_ENDDT
5      X      1    2020-04-01 1        2020-04-02  1        2020-04-02
9      X      3    2020-04-02 1        2020-04-03  2        2020-04-03
15     X      4    2020-04-03 1        2020-04-04  3        2020-04-05
44     X      3    2020-04-05 1        2020-04-06  5        2020-04-08
99     X      1    2020-04-08 1        2020-04-09  8        2020-04-09
7      X      3    2020-04-09 1        2020-04-11  9        2020-04-15
 

namliam

The Mailman - AWF VIP
Local time
Today, 12:58
Joined
Aug 11, 2003
Messages
11,696
SELECT X.Batch, X.Dept, Min(X.StartDT) AS MinOfStartDT, X.qEnd
FROM (SELECT R.Batch, R.Dept, R.StartDT, Min(S.StartDT) AS qEnd
FROM tbl AS R INNER JOIN tbl AS S ON ((R.Dept <> S.Dept) OR (R.Batch <> S.Batch)) AND (S.StartDT>[r].[startdt])
GROUP BY R.Batch, R.Dept, R.StartDT) AS X
GROUP BY X.Batch, X.Dept, X.qEnd
ORDER BY Min(X.StartDT)
Your query works nicely as well... just have to change the INNER join to LEFT
Code:
SELECT X.Batch, X.Dept, Min(X.StartDT) AS MinOfStartDT, X.qEnd
FROM (SELECT R.Batch, R.Dept, R.StartDT, Min(S.StartDT) AS qEnd
      FROM tbl AS R 
      LEFT JOIN tbl AS S ON R.Batch = S.Batch AND R.Dept <> S.Dept AND S.StartDT>[r].[startdt]
      GROUP BY R.Batch, R.Dept, R.StartDT) AS X
GROUP BY X.Batch, X.Dept, X.qEnd
ORDER BY Min(X.StartDT)
Then join in the META table to get the final enddate

Big difference between your and my solution is you generate 30 records as the intermediate result, which in the real world will culminate to millions of records, dont think the memory will like that :(
Where as mine keeps the record count down but has a higher CPU requirement, I will test both for performance when I get the chance :)

I will have to rewrite this to a production query :)
 

CJ_London

Super Moderator
Staff member
Local time
Today, 11:58
Joined
Feb 19, 2013
Messages
16,553
That doesn't seem to give the same result as originally expected
 
Last edited:

Users who are viewing this thread

Top Bottom