Removing outliers (1 Viewer)

Twincam

Registered User.
Local time
Today, 07:09
Joined
Aug 21, 2014
Messages
34
I have just spent an hour developing this, it does what I need, but can it be improved upon?

Basically I have a set of manual readings (valval) for a given item (keykey)

We sometimes get some mad readings, but if the readings are a sensible distance apart (say 10) they can be trusted

I want the highest, trusted, reading. i.e. the highest reading with a sensible reading behind it.

A 1
A 2
A 7
A 9 (difference = 2 so trusted)
A 44

B 1345
B 1347
B 1348 (difference = 1 so trusted)
B 11346
B 13449

Here's my working code - there are a lot of readings and it churns - if it could be speeded up we could run it more often :)

Please don't feel you have to write code, just a hint if you can think of a better way.

The code calcs thismax prevmax and prevprevmax, then decides whether to pass on prev and prevprev, or this and prev, depending on the differences. It does that twice so can deal with two overlarge readings.

SELECT tblData.KeyKey,
max(iif(O5.PrevPrevMax - O5.PrevMax < 10, O5.PrevMax, O5.ThisMax)) as ThisMax,
max(iif(O5.PrevPrevMax - O5.PrevMax < 10, O5.PrevPrevMax, O5.PrevMax)) as PrevMax
FROM tblData
LEFT JOIN (

SELECT tblData.KeyKey,
max(tblData.ValVal) AS ThisMax ,
max(O4.ThisMax) as PrevMax,
max(O4.PrevMax) as PrevPrevMax
FROM tblData
LEFT JOIN (

SELECT tblData.KeyKey,
max(iif(O3.PrevPrevMax - O3.PrevMax < 10, O3.PrevMax, O3.ThisMax)) as ThisMax,
max(iif(O3.PrevPrevMax - O3.PrevMax < 10, O3.PrevPrevMax, O3.PrevMax)) as PrevMax
FROM tblData
LEFT JOIN (

SELECT tblData.KeyKey,
max(tblData.ValVal) AS ThisMax ,
max(O2.ThisMax) as PrevMax,
max(O2.PrevMax) as PrevPrevMax
FROM tblData
LEFT JOIN (

SELECT tblData.KeyKey,
max(tblData.ValVal) AS ThisMax ,
max(O1.ThisMax) as PrevMax
FROM tblData
LEFT JOIN (
SELECT KeyKey, max(ValVal) as ThisMax
FROM tblData
GROUP BY KeyKey
) AS O1

ON tblData.KeyKey = O1.KeyKey
WHERE tblData.ValVal < O1.ThisMax
GROUP BY tblData.KeyKey
) AS O2

ON tblData.KeyKey = O2.KeyKey
WHERE tblData.ValVal < O2.ThisMax
GROUP BY tblData.KeyKey
) AS O3

ON tblData.KeyKey = O3.KeyKey
WHERE tblData.ValVal < O3.ThisMax
GROUP BY tblData.KeyKey
) AS O4

ON tblData
.KeyKey = O4.KeyKey
WHERE tblData.ValVal < O4.ThisMax
GROUP BY tblData.KeyKey
) AS O5

ON tblData.KeyKey = O5.KeyKey
WHERE tblData.ValVal < O5.ThisMax
GROUP BY tblData.KeyKey
 

plog

Banishment Pending
Local time
Today, 01:09
Joined
May 11, 2011
Messages
11,653
You've got too much code for me to mentally step through. So let me give you some examples to see what they should produce and if your code produces them:

C, 3
C,15

D, 16
D, 17
D, 19
D, 20
D, 21
D, 22
D, 24
D, 142
D, 151


I think no matter the code you use, its going to take a while to run--you have to do a lot of comparisons to get the difference between every record and the next one. I do think you can simplify your code from all those queries to 2. You would create a sub-query to calculate the difference between every record and the next, then in the main query you would just get the MAX value where the difference was less than 10.
 

Twincam

Registered User.
Local time
Today, 07:09
Joined
Aug 21, 2014
Messages
34
That's good thinking, I was doing it step by step, but if I work out ALL the differences at once....
 

Twincam

Registered User.
Local time
Today, 07:09
Joined
Aug 21, 2014
Messages
34
Down to three SELECTS - much better. I couldn't get it to work as two.

SELECT KeyKey, max(F2.ValVal)
FROM (

SELECT KeyKey, F2.ValVal, max(F2.Prev) as Prev, F2.ValVal - max(F2.Prev) AS Diff
FROM(

SELECT tblData.KeyKey, tblData.ValVal AS ValVal, F1.ValVal as Prev
FROM tblData

LEFT JOIN (
SELECT *
FROM tblData
) AS F1
ON tblData.KeyKey = F1.KeyKey
WHERE tblData.ValVal > F1.ValVal

) AS F2
GROUP BY KeyKey, F2.ValVal
HAVING F2.ValVal - max(F2.Prev) < 10

)
GROUP BY KeyKey
 

Users who are viewing this thread

Top Bottom