用 Mysql 计算中位数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5401158/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Calculating the Median with Mysql
提问by Tim
I'm having trouble with calculating the median of a list of values, not the average.
我在计算值列表的中位数时遇到问题,而不是平均值。
I found this article Simple way to calculate median with MySQL
我发现这篇文章 用 MySQL 计算中位数的简单方法
It has a reference to the following query which I don't understand properly.
它引用了我不太理解的以下查询。
SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2
If I have a time
column and I want to calculate the median value, what do the x
and y
columns refer to?
如果我有一time
列并且我想计算中值,那么x
和y
列指的是什么?
采纳答案by Krab
val
is your time column, x
and y
are two references to the data table (you can write data AS x, data AS y
).
val
是你的时间列,x
并且y
是两个引用数据表(可以写data AS x, data AS y
)。
EDIT: To avoid computing your sums twice, you can store the intermediate results.
编辑:为避免计算两次总和,您可以存储中间结果。
CREATE TEMPORARY TABLE average_user_total_time
(SELECT SUM(time) AS time_taken
FROM scores
WHERE created_at >= '2010-10-10'
and created_at <= '2010-11-11'
GROUP BY user_id);
Then you can compute median over these values which are in a named table.
然后,您可以计算命名表中这些值的中位数。
EDIT: Temporary table won't workhere. You could try using a regular table with "MEMORY" table type. Or just have your subquery that computes the values for the median twice in your query. Apart from this, I don't see another solution. This doesn't mean there isn't a better way, maybe somebody else will come with an idea.
编辑:临时表在这里不起作用。您可以尝试使用具有“MEMORY”表类型的常规表。或者只是让您的子查询在您的查询中计算两次中位数的值。除此之外,我没有看到其他解决方案。这并不意味着没有更好的方法,也许其他人会提出一个想法。
回答by Reggie Edwards
I propose a faster way.
我提出了一个更快的方法。
Get the row count:
获取行数:
SELECT CEIL(COUNT(*)/2) FROM data;
SELECT CEIL(COUNT(*)/2) FROM data;
Then take the middle value in a sorted subquery:
然后在排序的子查询中取中间值:
SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;
SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;
I tested this with a 5x10e6 dataset of random numbers and it will find the median in under 10 seconds.
我使用 5x10e6 随机数数据集对此进行了测试,它会在 10 秒内找到中位数。
This will find an arbitrary percentile by replacing the COUNT(*)/2
with COUNT(*)*n
where n
is the percentile (.5 for median, .75 for 75th percentile, etc).
这将通过更换找到任意百分位COUNT(*)/2
与COUNT(*)*n
地方n
是百分(.5位,0.75为第75百分位,等等)。
回答by Escualo
First try to understand what the median is: it is the middle value in the sorted list of values.
首先尝试了解中位数是什么:它是值排序列表中的中间值。
Once you understand that, the approach is two steps:
一旦你理解了这一点,方法就是两个步骤:
- sort the values in either order
- pick the middle value (if not an odd number of values, pick the average of the two middle values)
- 按任一顺序对值进行排序
- 选择中间值(如果不是奇数个值,则选择两个中间值的平均值)
Example:
例子:
Median of 0 1 3 7 9 10: 5 (because (7+3)/2=5)
Median of 0 1 3 7 9 10 11: 7 (because 7 is the middle value)
So, to sort dates you need a numerical value; you can get their time stamp (as seconds elapsed from epoch) and use the definition of median.
因此,要对日期进行排序,您需要一个数值;您可以获得它们的时间戳(从纪元经过的秒数)并使用中位数的定义。
回答by subrat mishra
Finding median in mysql using group_concat
使用group_concat在mysql中查找中位数
Query:
询问:
SELECT
IF(count%2=1,
SUBSTRING_INDEX(substring_index(data_str,",",pos),",",-1),
(SUBSTRING_INDEX(substring_index(data_str,",",pos),",",-1)
+ SUBSTRING_INDEX(substring_index(data_str,",",pos+1),",",-1))/2)
as median
FROM (SELECT group_concat(val order by val) data_str,
CEILING(count(*)/2) pos,
count(*) as count from data)temp;
Explanation:
解释:
Sorting is done using order by inside group_concat function
使用 group_concat 函数内部的 order by 进行排序
Position(pos) and Total number of elements (count) is identified. CEILING to identify position helps us to use substring_index function in the below steps.
位置(pos) 和元素总数(count) 被标识。CEILING 来识别位置有助于我们在以下步骤中使用 substring_index 函数。
Based on count, even or odd number of values is decided.
根据计数,决定偶数或奇数个值。
- Odd values: Directly choose the element belonging to the pos using substring_index.
- Even values: Find the element belonging to the pos and pos+1, then add them and divide by 2 to get the median.
- 奇数值:直接使用 substring_index 选择属于 pos 的元素。
- 偶数值:找到属于pos和pos+1的元素,然后相加除以2得到中位数。
Finally the median is calculated.
最后计算中位数。
回答by Nicolas De Bin
If you have a table R
with a column named A
, and you want the median of A, you can do as follows:
如果您有一个R
名为的表A
,并且您想要A的中位数,您可以执行以下操作:
SELECT A FROM R R1
WHERE ( SELECT COUNT(A) FROM R R2 WHERE R2.A < R1.A ) = ( SELECT COUNT(A) FROM R R3 WHERE R3.A > R1.A )
Note: This will only work if there are no duplicated values in A. Also, null values are not allowed.
注意:这仅在 A 中没有重复值时才有效。此外,不允许使用空值。
回答by jayu chawla
Simplest ways me and my friend have found out... ENJOY!!
我和我的朋友发现的最简单的方法......享受!
SELECT count(*) INTO @c from station;
select ROUND((@c+1)/2) into @final;
SELECT round(lat_n,4) from station a where @final-1=(select count(lat_n) from station b where b.lat_n > a.lat_n);
回答by Amitrajit Bose
Here is a solution that is easy to understand. Just replace Your_Columnand Your_Tableas per your requirement.
这是一个易于理解的解决方案。只需根据您的要求替换Your_Column和Your_Table 即可。
SET @r = 0;
SELECT AVG(Your_Column)
FROM (SELECT (@r := @r + 1) AS r, Your_Column FROM Your_Table ORDER BY Your_Column) Temp
WHERE
r = (SELECT CEIL(COUNT(*) / 2) FROM Your_Table) OR
r = (SELECT FLOOR((COUNT(*) / 2) + 1) FROM Your_Table)
Originally adopted from this thread.
最初采用此线程。