MySQL SQL 查询中的四分位数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31209608/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Quartiles in SQL query
提问by Hamma
I have a very simple table like that:
我有一个非常简单的表格:
CREATE TABLE IF NOT EXISTS LuxLog (
Sensor TINYINT,
Lux INT,
PRIMARY KEY(Sensor)
)
It contains thousands of logs from different sensors.
它包含来自不同传感器的数千个日志。
I would like to have Q1 and Q3 for all sensors.
我希望所有传感器都有 Q1 和 Q3。
I can do one query for every data, but it would be better for me to have one query for all sensors (getting Q1 and Q3 back from one query)
我可以对每个数据进行一次查询,但最好对所有传感器进行一次查询(从一次查询中获取 Q1 和 Q3)
I though it would be a fairly simple operation, as quartiles are broadly used and one of the main statistical variables in frequency calculation. The truth is that I found loads of overcomplicated solutions, while I was hoping to find something neat and simple.
我认为这将是一个相当简单的操作,因为四分位数被广泛使用并且是频率计算中的主要统计变量之一。事实是,我发现了大量过于复杂的解决方案,而我希望找到一些简洁明了的解决方案。
Anyone can give me a hint?
任何人都可以给我一个提示?
Edit: This is a piece of code that I found online, but it is not working for me:
编辑:这是我在网上找到的一段代码,但它对我不起作用:
SELECT SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT( -- 1) make a sorted list of values
Lux
ORDER BY Lux
SEPARATOR ','
)
, ',' -- 2) cut at the comma
, 75/100 * COUNT(*) -- at the position beyond the 90% portion
)
, ',' -- 3) cut at the comma
, -1 -- right after the desired list entry
) AS `75th Percentile`
FROM LuxLog
WHERE Sensor=12
AND Lux<>0
I am getting 1 as return value, while it should be a number that can be divided by 10 (10,20,30.....1000)
我得到 1 作为返回值,而它应该是一个可以被 10 整除的数字(10,20,30.....1000)
回答by Pholochtairze
See SqlFiddle : http://sqlfiddle.com/#!9/accca6/2/6Note : for the sqlfiddle I've generated 100 rows, each integer between 1 and 100 has a row, but it is a random order (done in excel).
请参阅 SqlFiddle:http: //sqlfiddle.com/#!9/accca6/2/6注意:对于 sqlfiddle,我生成了 100 行,1 到 100 之间的每个整数都有一行,但它是一个随机顺序(完成在excel中)。
Here is the code :
这是代码:
SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
SET @quartile := (ROUND(@number_of_rows*0.25));
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;
EDIT :
编辑 :
SET @current_sensor := 101;
SET @quartile := (ROUND((SELECT COUNT(*) FROM LuxLog WHERE Sensor = @current_sensor)*0.25));
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=', @current_sensor,' ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=', @current_sensor,' ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;
Underlying reasoning is as follows : For quartile 1 we want to get 25% from the top so we want to know how much rows there are, that's :
基本推理如下:对于四分位数 1,我们希望从顶部获得 25%,因此我们想知道有多少行,即:
SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
Now that we know the number of rows, we want to know what is 25% of that, it is this line :
既然我们知道了行数,我们想知道其中的 25% 是多少,就是这一行:
SET @quartile := (ROUND(@number_of_rows*0.25));
Then to find a quartile we want to order the LuxLog table by Lux, then to get the row number "@quartile", in order to do that we set the OFFSET to @quartile to say that we want to start our select from the row number @quartile and we say limit 1 to say that we want to retrieve only one row. That's :
然后要找到一个四分位数,我们要按 Lux 对 LuxLog 表进行排序,然后获取行号“@quartile”,为此我们将 OFFSET 设置为 @quartile 表示我们要从行开始选择number @quartile 并且我们说 limit 1 表示我们只想检索一行。那是 :
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
We do (almost) the same for the other quartile, but rather than starting from the top (from higher values to lower) we start from the bottom (it explains the ASC).
我们对另一个四分位数(几乎)做同样的事情,但不是从顶部开始(从较高的值到较低的值),而是从底部开始(它解释了 ASC)。
But for now we just have strings stored in the variables @sql_q1 and @sql_q3, so the concatenate them, we union the results of the queries, we prepare the query and execute it.
但是现在我们只有字符串存储在变量@sql_q1 和@sql_q3 中,所以连接它们,我们合并查询的结果,准备查询并执行它。
回答by Hamma
Well to use NTILE is very simple but it is a Postgres Function. You basically just do something like this:
使用 NTILE 非常简单,但它是一个 Postgres 函数。你基本上只是做这样的事情:
SELECT value_you_are_NTILING,
NTILE(4) OVER (ORDER BY value_you_are_NTILING DESC) AS tiles
FROM
(SELECT math_that_gives_you_the_value_you_are_NTILING_here AS value_you_are_NTILING FROM tablename);
Here is a simple example I made for you on SQLFiddle: http://sqlfiddle.com/#!15/7f05a/1
这是我在 SQLFiddle 上为您制作的一个简单示例:http://sqlfiddle.com/#!15/7f05a/1
In MySQL you would use RANK... Here is the SQLFiddle for that: http://www.sqlfiddle.com/#!2/d5587/1(this comes from the Question linked below)
在 MySQL 中,您将使用 RANK...这是 SQLFiddle:http://www.sqlfiddle.com/#!2/d5587 /1(这来自下面链接的问题)
This use of MySQL RANK() comes from the Stackoverflow answered here: Rank function in MySQL
MySQL RANK() 的这种使用来自于 Stackoverflow 的回答: MySQL 中的 Rank function
Look for the answer by Salman A.
寻找 Salman A 的答案。
回答by John
Something like this should do it:
像这样的事情应该这样做:
select
ll.*,
if (a.position is not null, 1,
if (b.position is not null, 2,
if (c.position is not null, 3,
if (d.position is not null, 4, 0)))
) as quartile
from
luxlog ll
left outer join luxlog a on ll.position = a.position and a.lux > (select count(*)*0.00 from luxlog) and a.lux <= (select count(*)*0.25 from luxlog)
left outer join luxlog b on ll.position = b.position and b.lux > (select count(*)*0.25 from luxlog) and b.lux <= (select count(*)*0.50 from luxlog)
left outer join luxlog c on ll.position = c.position and c.lux > (select count(*)*0.50 from luxlog) and c.lux <= (select count(*)*0.75 from luxlog)
left outer join luxlog d on ll.position = d.position and d.lux > (select count(*)*0.75 from luxlog)
;
Here's the complete example:
这是完整的示例:
use example;
drop table if exists luxlog;
CREATE TABLE LuxLog (
Sensor TINYINT,
Lux INT,
position int,
PRIMARY KEY(Position)
);
insert into luxlog values (0, 1, 10);
insert into luxlog values (0, 2, 20);
insert into luxlog values (0, 3, 30);
insert into luxlog values (0, 4, 40);
insert into luxlog values (0, 5, 50);
insert into luxlog values (0, 6, 60);
insert into luxlog values (0, 7, 70);
insert into luxlog values (0, 8, 80);
select count(*)*.25 from luxlog;
select count(*)*.50 from luxlog;
select
ll.*,
a.position,
b.position,
if(
a.position is not null, 1,
if (b.position is not null, 2, 0)
) as quartile
from
luxlog ll
left outer join luxlog a on ll.position = a.position and a.lux >= (select count(*)*0.00 from luxlog) and a.lux < (select count(*)*0.25 from luxlog)
left outer join luxlog b on ll.position = b.position and b.lux >= (select count(*)*0.25 from luxlog) and b.lux < (select count(*)*0.50 from luxlog)
left outer join luxlog c on ll.position = c.position and c.lux >= (select count(*)*0.50 from luxlog) and c.lux < (select count(*)*0.75 from luxlog)
left outer join luxlog d on ll.position = d.position and d.lux >= (select count(*)*0.75 from luxlog) and d.lux < (select count(*)*1.00 from luxlog)
;
select
ll.*,
if (a.position is not null, 1,
if (b.position is not null, 2,
if (c.position is not null, 3,
if (d.position is not null, 4, 0)))
) as quartile
from
luxlog ll
left outer join luxlog a on ll.position = a.position and a.lux > (select count(*)*0.00 from luxlog) and a.lux <= (select count(*)*0.25 from luxlog)
left outer join luxlog b on ll.position = b.position and b.lux > (select count(*)*0.25 from luxlog) and b.lux <= (select count(*)*0.50 from luxlog)
left outer join luxlog c on ll.position = c.position and c.lux > (select count(*)*0.50 from luxlog) and c.lux <= (select count(*)*0.75 from luxlog)
left outer join luxlog d on ll.position = d.position and d.lux > (select count(*)*0.75 from luxlog)
;
回答by John
Or you could use rank like this:
或者你可以像这样使用排名:
select
ll.*,
@curRank := @curRank + 1 as rank,
if (@curRank <= (select count(*)*0.25 from luxlog), 1,
if (@curRank <= (select count(*)*0.50 from luxlog), 2,
if (@curRank <= (select count(*)*0.75 from luxlog), 3, 4))
) as quartile
from
luxlog ll,
(SELECT @curRank := 0) r
;
And this will give just one record for each quartile:
这将为每个四分位数提供一个记录:
select
x.quartile, group_concat(position)
from (
select
ll.*,
@curRank := @curRank + 1 as rank,
if (@curRank > 0 and @curRank <= (select count(*)*0.25 from luxlog), 1,
if (@curRank > 0 and @curRank <= (select count(*)*0.50 from luxlog), 2,
if (@curRank > 0 and @curRank <= (select count(*)*0.75 from luxlog), 3, 4))
) as quartile
from
luxlog ll,
(SELECT @curRank := 0) r
) x
group by quartile
+ ------------- + --------------------------- +
| quartile | group_concat(position) |
+ ------------- + --------------------------- +
| 1 | 10,20 |
| 2 | 30,40 |
| 3 | 50,60 |
| 4 | 70,80 |
+ ------------- + --------------------------- +
4 rows
EDIT: The sqlFiddle example (http://sqlfiddle.com/#!9/a14a4/17) looks like this after this is removed
编辑: sqlFiddle 示例(http://sqlfiddle.com/#!9/a14a4/17)在删除后看起来像这样
/*SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
SET @quartile := (ROUND(@number_of_rows*0.25));
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=101 ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=101 ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;*/
回答by icats
Here's a query I came up with for calculating quartiles; it runs in ~0.04s w/ ~5000 table rows. I included the min/max values as I am ultimately using this data to build the four quartile ranges:
这是我为计算四分位数而提出的查询;它在 ~0.04sw/ ~5000 表行中运行。我包括了最小值/最大值,因为我最终使用这些数据来构建四个四分位数范围:
SELECT percentile_table.percentile, avg(ColumnName) AS percentile_values
FROM
(SELECT @rownum := @rownum + 1 AS `row_number`,
d.ColumnName
FROM PercentileTestTable d,
(SELECT @rownum := 0) r
WHERE ColumnName IS NOT NULL
ORDER BY d.ColumnName
) AS t1,
(SELECT count(*) AS total_rows
FROM PercentileTestTable d
WHERE ColumnName IS NOT NULL
) AS t2,
(SELECT 0 AS percentile
UNION ALL
SELECT 0.25
UNION ALL
SELECT 0.5
UNION ALL
SELECT 0.75
UNION ALL
SELECT 1
) AS percentile_table
WHERE
(percentile_table.percentile != 0
AND percentile_table.percentile != 1
AND t1.row_number IN
(
floor(( total_rows + 1 ) * percentile_table.percentile),
floor(( total_rows + 2 ) * percentile_table.percentile)
)
) OR (
percentile_table.percentile = 0
AND t1.row_number = 1
) OR (
percentile_table.percentile = 1
AND t1.row_number = total_rows
)
GROUP BY percentile_table.percentile;
Fiddle here: http://sqlfiddle.com/#!9/58c0e2/1
在这里小提琴:http://sqlfiddle.com/#!9/58c0e2/1
There are certainly performance issues; I'd love if anyone has feedback on how to improve this.
肯定存在性能问题;如果有人对如何改进这一点有反馈,我很乐意。
Sample data list:
示例数据列表:
3, 4, 4, 4, 7, 10, 11, 12, 14, 16, 17, 18
Sample query output:
示例查询输出:
| percentile | percentile_values |
|------------|-------------------|
| 0 | 3 |
| 0.25 | 4 |
| 0.5 | 10.5 |
| 0.75 | 15 |
| 1 | 18 |
回答by dartaloufe
I use this solution with a MYSQL function :
我将此解决方案与 MYSQL 函数一起使用:
xis the centile you want
x是你想要的百分位数
array_valuesyour group_concat values order and separated by ,
array_values您的 group_concat 值顺序并由 , 分隔
DROP FUNCTION IF EXISTS centile;
delimiter $$
CREATE FUNCTION `centile`(x Text, array_values TEXT) RETURNS text
BEGIN
Declare DIFF_RANK TEXT;
Declare RANG_FLOOR INT;
Declare COUNT INT;
Declare VALEUR_SUP TEXT;
Declare VALEUR_INF TEXT;
SET COUNT = LENGTH(array_values) - LENGTH(REPLACE(array_values, ',', '')) + 1;
SET RANG_FLOOR = FLOOR(ROUND((x) * (COUNT-1),2));
SET DIFF_RANK = ((x) * (COUNT-1)) - FLOOR(ROUND((x) * (COUNT-1),2));
SET VALEUR_SUP = CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(array_values,',', RANG_FLOOR+2),',',-1) AS DECIMAL);
SET VALEUR_INF = CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(array_values,',', RANG_FLOOR+1),',',-1) AS DECIMAL);
/****
https://fr.wikipedia.org/wiki/Quantile
x_j+1 + g (x_j+2 - x_j+1)
***/
RETURN Round((VALEUR_INF + (DIFF_RANK* (VALEUR_SUP-VALEUR_INF) ) ),2);
END$$
Example :
例子 :
Select centile(3/4,GROUP_CONCAT(lux ORDER BY lux SEPARATOR ',')) as quartile_3
FROM LuxLog
WHERE Sensor=12 AND Lux<>0