MySQL 获取直方图数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1764881/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Getting data for histogram plot
提问by Legend
Is there a way to specify bin sizes in MySQL? Right now, I am trying the following SQL query:
有没有办法在 MySQL 中指定 bin 大小?现在,我正在尝试以下 SQL 查询:
select total, count(total) from faults GROUP BY total;
The data that is being generated is good enough but there are just too many rows. What I need is a way to group the data into predefined bins. I can do this from a scripting language, but is there a way to do it directly in SQL?
正在生成的数据足够好,但行太多了。我需要的是一种将数据分组到预定义箱中的方法。我可以用脚本语言做到这一点,但有没有办法直接在 SQL 中做到这一点?
Example:
例子:
+-------+--------------+
| total | count(total) |
+-------+--------------+
| 30 | 1 |
| 31 | 2 |
| 33 | 1 |
| 34 | 3 |
| 35 | 2 |
| 36 | 6 |
| 37 | 3 |
| 38 | 2 |
| 41 | 1 |
| 42 | 5 |
| 43 | 1 |
| 44 | 7 |
| 45 | 4 |
| 46 | 3 |
| 47 | 2 |
| 49 | 3 |
| 50 | 2 |
| 51 | 3 |
| 52 | 4 |
| 53 | 2 |
| 54 | 1 |
| 55 | 3 |
| 56 | 4 |
| 57 | 4 |
| 58 | 2 |
| 59 | 2 |
| 60 | 4 |
| 61 | 1 |
| 63 | 2 |
| 64 | 5 |
| 65 | 2 |
| 66 | 3 |
| 67 | 5 |
| 68 | 5 |
------------------------
What I am looking for:
我在找什么:
+------------+---------------+
| total | count(total) |
+------------+---------------+
| 30 - 40 | 23 |
| 40 - 50 | 15 |
| 50 - 60 | 51 |
| 60 - 70 | 45 |
------------------------------
I guess this cannot be achieved in a straight forward manner but a reference to any related stored procedure would be fine as well.
我想这不能以直接的方式实现,但对任何相关存储过程的引用也可以。
回答by Jaro
This is a post about a super quick-and-dirty way to create a histogram in MySQL for numeric values.
There are multiple other ways to create histograms that are better and more flexible, using CASE statements and other types of complex logic. This method wins me over time and time again since it's just so easy to modify for each use case, and so short and concise. This is how you do it:
SELECT ROUND(numeric_value, -2) AS bucket, COUNT(*) AS COUNT, RPAD('', LN(COUNT(*)), '*') AS bar FROM my_table GROUP BY bucket;
Just change numeric_value to whatever your column is, change the rounding increment, and that's it. I've made the bars to be in logarithmic scale, so that they don't grow too much when you have large values.
numeric_value should be offset in the ROUNDing operation, based on the rounding increment, in order to ensure the first bucket contains as many elements as the following buckets.
e.g. with ROUND(numeric_value,-1), numeric_value in range [0,4] (5 elements) will be placed in first bucket, while [5,14] (10 elements) in second, [15,24] in third, unless numeric_value is offset appropriately via ROUND(numeric_value - 5, -1).
This is an example of such query on some random data that looks pretty sweet. Good enough for a quick evaluation of the data.
+--------+----------+-----------------+ | bucket | count | bar | +--------+----------+-----------------+ | -500 | 1 | | | -400 | 2 | * | | -300 | 2 | * | | -200 | 9 | ** | | -100 | 52 | **** | | 0 | 5310766 | *************** | | 100 | 20779 | ********** | | 200 | 1865 | ******** | | 300 | 527 | ****** | | 400 | 170 | ***** | | 500 | 79 | **** | | 600 | 63 | **** | | 700 | 35 | **** | | 800 | 14 | *** | | 900 | 15 | *** | | 1000 | 6 | ** | | 1100 | 7 | ** | | 1200 | 8 | ** | | 1300 | 5 | ** | | 1400 | 2 | * | | 1500 | 4 | * | +--------+----------+-----------------+
Some notes: Ranges that have no match will not appear in the count - you will not have a zero in the count column. Also, I'm using the ROUND function here. You can just as easily replace it with TRUNCATE if you feel it makes more sense to you.
这是一篇关于在 MySQL 中为数值创建直方图的超级快速和肮脏的方法的帖子。
使用 CASE 语句和其他类型的复杂逻辑,还有多种其他方法可以创建更好、更灵活的直方图。这种方法一次又一次地赢得了我的青睐,因为它非常容易针对每个用例进行修改,而且非常简短。这是你如何做到的:
SELECT ROUND(numeric_value, -2) AS bucket, COUNT(*) AS COUNT, RPAD('', LN(COUNT(*)), '*') AS bar FROM my_table GROUP BY bucket;
只需将 numeric_value 更改为您的任何列,更改舍入增量,就是这样。我已将条形设为对数刻度,这样当您有较大的值时,它们就不会增长太多。
numeric_value 应该在 ROUNDing 操作中根据舍入增量进行偏移,以确保第一个桶包含与后面的桶一样多的元素。
例如,对于 ROUND(numeric_value,-1),范围 [0,4](5 个元素)中的 numeric_value 将被放置在第一个桶中,而 [5,14](10 个元素)在第二个桶中,[15,24] 在第三个桶中,除非 numeric_value 通过 ROUND(numeric_value - 5, -1) 适当偏移。
这是对一些看起来很不错的随机数据进行此类查询的示例。足以快速评估数据。
+--------+----------+-----------------+ | bucket | count | bar | +--------+----------+-----------------+ | -500 | 1 | | | -400 | 2 | * | | -300 | 2 | * | | -200 | 9 | ** | | -100 | 52 | **** | | 0 | 5310766 | *************** | | 100 | 20779 | ********** | | 200 | 1865 | ******** | | 300 | 527 | ****** | | 400 | 170 | ***** | | 500 | 79 | **** | | 600 | 63 | **** | | 700 | 35 | **** | | 800 | 14 | *** | | 900 | 15 | *** | | 1000 | 6 | ** | | 1100 | 7 | ** | | 1200 | 8 | ** | | 1300 | 5 | ** | | 1400 | 2 | * | | 1500 | 4 | * | +--------+----------+-----------------+
一些注意事项:不匹配的范围不会出现在计数中——计数列中不会出现零。另外,我在这里使用了 ROUND 函数。如果你觉得它对你更有意义,你可以很容易地用 TRUNCATE 替换它。
I found it here http://blog.shlomoid.com/2011/08/how-to-quickly-create-histogram-in.html
我在这里找到它http://blog.shlomoid.com/2011/08/how-to-quickly-create-histogram-in.html
回答by Bernardo Siu
Mike DelGaudio's answer is the way I do it, but with a slight change:
Mike DelGaudio 的回答是我这样做的方式,但略有变化:
select floor(mycol/10)*10 as bin_floor, count(*)
from mytable
group by 1
order by 1
The advantage? You can make the bins as large or as small as you want. Bins of size 100? floor(mycol/100)*100
. Bins of size 5? floor(mycol/5)*5
.
优势?您可以根据需要使垃圾箱变大或变小。大小为 100 的垃圾箱?floor(mycol/100)*100
. 大小为 5 的垃圾箱?floor(mycol/5)*5
.
Bernardo.
贝尔纳多。
回答by Ofri Raviv
SELECT b.*,count(*) as total FROM bins b
left outer join table1 a on a.value between b.min_value and b.max_value
group by b.min_value
The table bins contains columns min_value and max_value which define the bins. note that the operator "join... on x BETWEEN y and z" is inclusive.
表 bins 包含定义 bins 的 min_value 和 max_value 列。请注意,运算符“join... on x BETWEEN y and z”包含在内。
table1 is the name of the data table
table1 是数据表的名称
回答by David West
Ofri Raviv's answer is very close but incorrect. The count(*)
will be 1
even if there are zero results in a histogram interval. The query needs to be modified to use a conditional sum
:
Ofri Raviv 的回答非常接近但不正确。该count(*)
会是1
即使有任何结果的直方图区间。需要修改查询以使用条件sum
:
SELECT b.*, SUM(a.value IS NOT NULL) AS total FROM bins b
LEFT JOIN a ON a.value BETWEEN b.min_value AND b.max_value
GROUP BY b.min_value;
回答by sammy
select "30-34" as TotalRange,count(total) as Count from table_name
where total between 30 and 34
union (
select "35-39" as TotalRange,count(total) as Count from table_name
where total between 35 and 39)
union (
select "40-44" as TotalRange,count(total) as Count from table_name
where total between 40 and 44)
union (
select "45-49" as TotalRange,count(total) as Count from table_name
where total between 45 and 49)
etc ....
As long as there are not too many intervals, this is a pretty good solution.
只要间隔不是太多,这是一个很好的解决方案。
回答by Dologan
I made a procedure that can be used to automatically generate a temporary table for bins according to a specified number or size, for later use with Ofri Raviv's solution.
我制作了一个程序,可用于根据指定的数量或大小自动生成 bin 的临时表,供以后与 Ofri Raviv 的解决方案一起使用。
CREATE PROCEDURE makebins(numbins INT, binsize FLOAT) # binsize may be NULL for auto-size
BEGIN
SELECT FLOOR(MIN(colval)) INTO @binmin FROM yourtable;
SELECT CEIL(MAX(colval)) INTO @binmax FROM yourtable;
IF binsize IS NULL
THEN SET binsize = CEIL((@binmax-@binmin)/numbins); # CEIL here may prevent the potential creation a very small extra bin due to rounding errors, but no good where floats are needed.
END IF;
SET @currlim = @binmin;
WHILE @currlim + binsize < @binmax DO
INSERT INTO bins VALUES (@currlim, @currlim+binsize);
SET @currlim = @currlim + binsize;
END WHILE;
INSERT INTO bins VALUES (@currlim, @maxbin);
END;
DROP TABLE IF EXISTS bins; # be careful if you have a bins table of your own.
CREATE TEMPORARY TABLE bins (
minval INT, maxval INT, # or FLOAT, if needed
KEY (minval), KEY (maxval) );# keys could perhaps help if using a lot of bins; normally negligible
CALL makebins(20, NULL); # Using 20 bins of automatic size here.
SELECT bins.*, count(*) AS total FROM bins
LEFT JOIN yourtable ON yourtable.value BETWEEN bins.minval AND bins.maxval
GROUP BY bins.minval
This will generate the histogram count only for the bins that are populated. David West ought to be right in his correction, but for some reason, unpopulated bins do not appear in the result for me (despite the use of a LEFT JOIN — I do not understand why).
这将仅为填充的 bin 生成直方图计数。David West 的更正应该是正确的,但由于某种原因,未填充的 bin 没有出现在我的结果中(尽管使用了 LEFT JOIN - 我不明白为什么)。
回答by Renaud
That should work. Not so elegant but still:
那应该工作。不那么优雅,但仍然:
select count(mycol - (mycol mod 10)) as freq, mycol - (mycol mod 10) as label
from mytable
group by mycol - (mycol mod 10)
order by mycol - (mycol mod 10) ASC
via Mike DelGaudio
通过迈克·德尔高迪奥
回答by Zebra
select case when total >= 30 and total <= 40 THEN "30-40"
else when total >= 40 and total <= 50 then "40-50"
else "50-60" END as Total , count(total)
group by Total
回答by user824276
Equal width binning into a given count of bins:
等宽分箱到给定的分箱数量:
WITH bins AS(
SELECT min(col) AS min_value
, ((max(col)-min(col)) / 10.0) + 0.0000001 AS bin_width
FROM cars
)
SELECT tab.*,
floor((col-bins.min_value) / bins.bin_width ) AS bin
FROM tab, bins;
Note that the 0.0000001 is there to make sure that the records with the value equal to max(col) do not make it's own bin just by itself. Also, the additive constant is there to make sure the query does not fail on division by zero when all the values in the column are identical.
请注意, 0.0000001 是为了确保值等于 max(col) 的记录不会使其本身成为自己的 bin。此外,当列中的所有值都相同时,附加常量可确保查询不会因除以零而失败。
Also note that the count of bins (10 in the example) should be written with a decimal mark to avoid integer division (the unadjusted bin_width can be decimal).
另请注意,bin 的计数(示例中为 10)应使用小数点标记以避免整数除法(未调整的 bin_width 可以是十进制)。
回答by zub0r
In addition to great answer https://stackoverflow.com/a/10363145/916682, you can use phpmyadmin chart tool for a nice result:
除了很好的答案https://stackoverflow.com/a/10363145/916682,您还可以使用 phpmyadmin 图表工具获得不错的结果: