Java 如何检测 ArrayList 中的异常值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18805178/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to detect outliers in an ArrayList
提问by Ashton
I'm trying to think of some code that will allow me to search through my ArrayList and detect any values outside the common range of "good values."
我试图想出一些代码,让我可以搜索我的 ArrayList 并检测“好值”的共同范围之外的任何值。
Example: 100 105 102 13 104 22 101
示例:100 105 102 13 104 22 101
How would I be able to write the code to detect that (in this case) 13 and 22 don't fall within the "good values" of around 100?
我将如何编写代码来检测(在这种情况下)13 和 22 不在 100 左右的“良好值”范围内?
回答by Jigar Joshi
- find the mean value for your list
- create a
Map
that maps the number to the distance from mean - sort values by the distance from mean
- and differentiate last
n
number, making sure there is no injustice with distance
- 找到列表的平均值
- 创建一个
Map
将数字映射到与平均值的距离 - 按与平均值的距离对值进行排序
- 并区分最后一个
n
数字,确保距离没有不公平
回答by Joni
There are several criteriafor detecting outliers. The simplest ones, like Chauvenet's criterion, use the mean and standard deviation calculated from the sample to determine a "normal" range for values. Any value outside of this range is deemed an outlier.
检测异常值有几个标准。最简单的标准,如Chauvenet 的标准,使用从样本计算的均值和标准差来确定值的“正常”范围。任何超出此范围的值都被视为异常值。
Other criterions are Grubb's testand Dixon's Q testand may give better results than Chauvenet's for example if the sample comes from a skew distribution.
其他标准是Grubb's test和Dixon's Q test,如果样本来自偏斜分布,则可能会给出比 Chauvenet 更好的结果。
回答by ?ukasz Rzeszotarski
It is just a very simple implementation which fetches the information which numbers are not in the range:
这只是一个非常简单的实现,它获取数字不在范围内的信息:
List<Integer> notInRangeNumbers = new ArrayList<Integer>();
for (Integer number : numbers) {
if (!isInRange(number)) {
// call with a predefined factor value, here example value = 5
notInRangeNumbers.add(number, 5);
}
}
Additionally inside the isInRange
method you have to define what do you mean by 'good values'. Below you will find an examplary implementation.
此外,在isInRange
方法内部,您必须定义 'good values' 是什么意思。您将在下面找到一个示例实现。
private boolean isInRange(Integer number, int aroundFactor) {
//TODO the implementation of the 'in range condition'
// here the example implementation
return number <= 100 + aroundFactor && number >= 100 - aroundFactor;
}
回答by mesutpiskin
Use this algorithm. This algorithm uses the average and standard deviation. These 2 number optional values (2 * standardDeviation).
使用这个算法。该算法使用平均值和标准偏差。这 2 个数字可选值(2 * 标准偏差)。
public static List<int> StatisticalOutLierAnalysis(List<int> allNumbers)
{
if (allNumbers.Count == 0)
return null;
List<int> normalNumbers = new List<int>();
List<int> outLierNumbers = new List<int>();
double avg = allNumbers.Average();
double standardDeviation = Math.Sqrt(allNumbers.Average(v => Math.Pow(v - avg, 2)));
foreach (int number in allNumbers)
{
if ((Math.Abs(number - avg)) > (2 * standardDeviation))
outLierNumbers.Add(number);
else
normalNumbers.Add(number);
}
return normalNumbers;
}
回答by Travis
An implementation of the Grubb's testcan be found at MathUtil.java. It will find a single outlier, of which you can remove from your list and repeat until you've removed all outliers.
可以在MathUtil.java 中找到Grubb 测试的实现。它将找到一个异常值,您可以将其从列表中删除并重复,直到删除所有异常值。
Depends on commons-math
, so if you're using Gradle:
取决于commons-math
,所以如果您使用 Gradle:
dependencies {
compile 'org.apache.commons:commons-math:2.2'
}
回答by sklimkovitch
package test;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class Main {
public static void main(String[] args) {
List<Double> data = new ArrayList<Double>();
data.add((double) 20);
data.add((double) 65);
data.add((double) 72);
data.add((double) 75);
data.add((double) 77);
data.add((double) 78);
data.add((double) 80);
data.add((double) 81);
data.add((double) 82);
data.add((double) 83);
Collections.sort(data);
System.out.println(getOutliers(data));
}
public static List<Double> getOutliers(List<Double> input) {
List<Double> output = new ArrayList<Double>();
List<Double> data1 = new ArrayList<Double>();
List<Double> data2 = new ArrayList<Double>();
if (input.size() % 2 == 0) {
data1 = input.subList(0, input.size() / 2);
data2 = input.subList(input.size() / 2, input.size());
} else {
data1 = input.subList(0, input.size() / 2);
data2 = input.subList(input.size() / 2 + 1, input.size());
}
double q1 = getMedian(data1);
double q3 = getMedian(data2);
double iqr = q3 - q1;
double lowerFence = q1 - 1.5 * iqr;
double upperFence = q3 + 1.5 * iqr;
for (int i = 0; i < input.size(); i++) {
if (input.get(i) < lowerFence || input.get(i) > upperFence)
output.add(input.get(i));
}
return output;
}
private static double getMedian(List<Double> data) {
if (data.size() % 2 == 0)
return (data.get(data.size() / 2) + data.get(data.size() / 2 - 1)) / 2;
else
return data.get(data.size() / 2);
}
}
Output: [20.0]
输出:[20.0]
Explanation:
解释:
- Sort a list of integers, from low to high
- Split a list of integers into 2 parts (by a middle) and put them into 2 new separate ArrayLists (call them "left" and "right")
- Find a middle number (median) in both of those new ArrayLists
- Q1 is a median from left side, and Q3 is the median from the right side
- Applying mathematical formula:
- IQR = Q3 - Q1
- LowerFence = Q1 - 1.5*IQR
- UpperFence = Q3 + 1.5*IQR
- More info about this formula: http://www.mathwords.com/o/outlier.htm
- Loop through all of my original elements, and if any of them are lower than a lower fence, or higher than an upper fence, add them to "output" ArrayList
- This new "output" ArrayList contains the outliers
- 对整数列表进行排序,从低到高
- 将整数列表分成 2 部分(由中间部分)并将它们放入 2 个新的单独 ArrayLists(称它们为“左”和“右”)
- 在这两个新的 ArrayLists 中找到一个中间数(中位数)
- Q1是左边的中位数,Q3是右边的中位数
- 应用数学公式:
- IQR = Q3 - Q1
- 下围栏 = Q1 - 1.5*IQR
- UpperFence = Q3 + 1.5*IQR
- 关于这个公式的更多信息:http: //www.mathwords.com/o/outlier.htm
- 循环遍历我的所有原始元素,如果其中任何一个低于下围栏或高于上围栏,则将它们添加到“输出”ArrayList
- 这个新的“输出”ArrayList 包含异常值
回答by Valiyev
As Jonialready pointed out , you can eliminate outliers with the help of Standard Deviation and Mean. Here is my code, that you can use for your purposes.
正如Joni已经指出的那样,您可以借助标准偏差和均值来消除异常值。这是我的代码,您可以将其用于您的目的。
public static void main(String[] args) {
List<Integer> values = new ArrayList<>();
values.add(100);
values.add(105);
values.add(102);
values.add(13);
values.add(104);
values.add(22);
values.add(101);
System.out.println("Before: " + values);
System.out.println("After: " + eliminateOutliers(values,1.5f));
}
protected static double getMean(List<Integer> values) {
int sum = 0;
for (int value : values) {
sum += value;
}
return (sum / values.size());
}
public static double getVariance(List<Integer> values) {
double mean = getMean(values);
int temp = 0;
for (int a : values) {
temp += (a - mean) * (a - mean);
}
return temp / (values.size() - 1);
}
public static double getStdDev(List<Integer> values) {
return Math.sqrt(getVariance(values));
}
public static List<Integer> eliminateOutliers(List<Integer> values, float scaleOfElimination) {
double mean = getMean(values);
double stdDev = getStdDev(values);
final List<Integer> newList = new ArrayList<>();
for (int value : values) {
boolean isLessThanLowerBound = value < mean - stdDev * scaleOfElimination;
boolean isGreaterThanUpperBound = value > mean + stdDev * scaleOfElimination;
boolean isOutOfBounds = isLessThanLowerBound || isGreaterThanUpperBound;
if (!isOutOfBounds) {
newList.add(value);
}
}
int countOfOutliers = values.size() - newList.size();
if (countOfOutliers == 0) {
return values;
}
return eliminateOutliers(newList,scaleOfElimination);
}
- eliminateOutliers() method is doing all the work
- It is a recursive method, which modifies the list with every recursive call
- scaleOfElimination variable, which you pass to the method, defines at what scale you want to remove outliers: Normally i go with 1.5f-2f, the greater the variable is, the less outliers will be removed
- removeOutliers() 方法正在做所有的工作
- 它是一种递归方法,每次递归调用都会修改列表
- 您传递给该方法的 scaleOfElimination 变量定义了您要以什么比例删除异常值:通常我使用 1.5f-2f,变量越大,删除的异常值越少
The output of the code:
代码的输出:
Before: [100, 105, 102, 13, 104, 22, 101]
After: [100, 105, 102, 104, 101]
之前:[100, 105, 102, 13, 104, 22, 101]
之后:[100, 105, 102, 104, 101]
回答by Emil Wo?niak
I'm very glad and thanks to Valiyev. His solution helped me a lot. And I want to shere my little SRP on his works.
我很高兴并感谢Valiyev。他的解决方案对我帮助很大。我想在他的作品上展示我的小 SRP。
Please note that I use List.of()
to store Dixon's critical values, for this reason it is required to use Java higher than 8.
请注意,我List.of()
用来存储 Dixon 的临界值,因此需要使用高于 8 的 Java。
public class DixonTest {
protected List<Double> criticalValues =
List.of(0.941, 0.765, 0.642, 0.56, 0.507, 0.468, 0.437);
private double scaleOfElimination;
private double mean;
private double stdDev;
private double getMean(final List<Double> input) {
double sum = input.stream()
.mapToDouble(value -> value)
.sum();
return (sum / input.size());
}
private double getVariance(List<Double> input) {
double mean = getMean(input);
double temp = input.stream()
.mapToDouble(a -> a)
.map(a -> (a - mean) * (a - mean))
.sum();
return temp / (input.size() - 1);
}
private double getStdDev(List<Double> input) {
return Math.sqrt(getVariance(input));
}
protected List<Double> eliminateOutliers(List<Double> input) {
int N = input.size() - 3;
scaleOfElimination = criticalValues.get(N).floatValue();
mean = getMean(input);
stdDev = getStdDev(input);
return input.stream()
.filter(this::isOutOfBounds)
.collect(Collectors.toList());
}
private boolean isOutOfBounds(Double value) {
return !(isLessThanLowerBound(value)
|| isGreaterThanUpperBound(value));
}
private boolean isGreaterThanUpperBound(Double value) {
return value > mean + stdDev * scaleOfElimination;
}
private boolean isLessThanLowerBound(Double value) {
return value < mean - stdDev * scaleOfElimination;
}
}
I hope it will help someone else.
我希望它会帮助别人。
Best regard
最良好的问候