charliemopps
Registered User.
- Local time
- Today, 13:11
- Joined
- Feb 24, 2009
- Messages
- 40
So I have a DB that I made to track problems on a network. I don't want to get into what it does because it will get overly complicated so I'm going to just use an analogy.
Lets pretend that I own several hundred fruit stands, I want to track if I'm getting bad fruit from my vendors. There are 2 ways I could end up getting bad fruit.
The first is pretty basic... A bad shipment, a time when entire boxes arrive at the site totally bad... This was pretty simple to find. The fruitstand workers report how many bad fruits they find per kind. I get info like this (but there are hundreds of sites and about a dozen "Fruit"):
Now, I've easilly made a querry that rules out all but the site with the problem. Clearly I got some bad bananas in Chicago. The problem is, my vendor is smart. I called and complained about the bad batch. He realized that to sneak the bad fruit past me, all he has to do is sneak in a bad banana into each box. Now, no paticular site will get past the filter. But each site has gone up by 2 bananas each.
I have successfully created a querry to identify such an occurance by simply checking the sum of Bananas against their average. The problem is that if there is a single site that recieves an entire bad box of bananas, it will set this querry off as well. Making me think there is a "country wide" problem as well as a single site problem.
Does anyone know how I would go about this statistically to find Larger problem and aslo the single site problem? Is there a way to throw out what we would consider data far above the norm?
Thanks!
Lets pretend that I own several hundred fruit stands, I want to track if I'm getting bad fruit from my vendors. There are 2 ways I could end up getting bad fruit.
The first is pretty basic... A bad shipment, a time when entire boxes arrive at the site totally bad... This was pretty simple to find. The fruitstand workers report how many bad fruits they find per kind. I get info like this (but there are hundreds of sites and about a dozen "Fruit"):
Code:
NewYork Bananas |0 bad|3 avg bad over past 4 weeks|
NewYork Apples |0 bad|4 avg bad over past 4 weeks|
Chicago Bananas |28 bad|3 avg bad over past 4 weeks|
Chicago Apples |3 bad|2 avg bad over past 4 weeks|
St Lois Bananas |2 bad|5 avg bad over past 4 weeks|
St Lois Apples |1 bad|0 avg bad over past 4 weeks|
Now, I've easilly made a querry that rules out all but the site with the problem. Clearly I got some bad bananas in Chicago. The problem is, my vendor is smart. I called and complained about the bad batch. He realized that to sneak the bad fruit past me, all he has to do is sneak in a bad banana into each box. Now, no paticular site will get past the filter. But each site has gone up by 2 bananas each.
I have successfully created a querry to identify such an occurance by simply checking the sum of Bananas against their average. The problem is that if there is a single site that recieves an entire bad box of bananas, it will set this querry off as well. Making me think there is a "country wide" problem as well as a single site problem.
Does anyone know how I would go about this statistically to find Larger problem and aslo the single site problem? Is there a way to throw out what we would consider data far above the norm?
Thanks!