Python 从 Matplotlib 中预先计算的数据绘制直方图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19212508/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Plotting a histogram from pre-counted data in Matplotlib
提问by Josh Rosen
I'd like to use Matplotlib to plot a histogram over data that's been pre-counted. For example, say I have the raw data
我想使用 Matplotlib 在预先计算的数据上绘制直方图。例如,假设我有原始数据
data = [1, 2, 2, 3, 4, 5, 5, 5, 5, 6, 10]
Given this data, I can use
鉴于这些数据,我可以使用
pylab.hist(data, bins=[...])
to plot a histogram.
绘制直方图。
In my case, the data has been pre-counted and is represented as a dictionary:
就我而言,数据已预先计算并表示为字典:
counted_data = {1: 1, 2: 2, 3: 1, 4: 1, 5: 4, 6: 1, 10: 1}
Ideally, I'd like to pass this pre-counted data to a histogram function that lets me control the bin widths, plot range, etc, as if I had passed it the raw data. As a workaround, I'm expanding my counts into the raw data:
理想情况下,我想将这个预先计算的数据传递给一个直方图函数,让我可以控制 bin 宽度、绘图范围等,就好像我已经将原始数据传递给它一样。作为一种解决方法,我将我的计数扩展到原始数据中:
data = list(chain.from_iterable(repeat(value, count)
for (value, count) in counted_data.iteritems()))
This is inefficient when counted_data
contains counts for millions of data points.
当counted_data
包含数百万个数据点的计数时,这是低效的。
Is there an easier way to use Matplotlib to produce a histogram from my pre-counted data?
有没有更简单的方法可以使用 Matplotlib 从我的预先计算的数据中生成直方图?
Alternatively, if it's easiest to just bar-plot data that's been pre-binned, is there a convenience method to "roll-up" my per-item counts into binned counts?
或者,如果仅对预先分箱的条形图数据最简单,是否有一种方便的方法可以将我的每项计数“汇总”为分箱计数?
采纳答案by Josh Rosen
I used pyplot.hist's weights
option to weight each key by its value, producing the histogram that I wanted:
我使用pyplot.hist的weights
选项按每个键的值加权,生成我想要的直方图:
pylab.hist(counted_data.keys(), weights=counted_data.values(), bins=range(50))
pylab.hist(counted_data.keys(), weights=counted_data.values(), bins=range(50))
This allows me to rely on hist
to re-bin my data.
这让我可以依靠hist
重新装箱我的数据。
回答by tacaswell
You can use the weights
keyword argument to np.histgram
(which plt.hist
calls underneath)
您可以使用weights
关键字参数 to np.histgram
(plt.hist
在下面调用)
val, weight = zip(*[(k, v) for k,v in counted_data.items()])
plt.hist(val, weights=weight)
Assuming you onlyhave integers as the keys, you can also use bar
directly:
假设你只有整数作为键,你也可以bar
直接使用:
min_bin = np.min(counted_data.keys())
max_bin = np.max(counted_data.keys())
bins = np.arange(min_bin, max_bin + 1)
vals = np.zeros(max_bin - min_bin + 1)
for k,v in counted_data.items():
vals[k - min_bin] = v
plt.bar(bins, vals, ...)
where ... is what ever arguments you want to pass to bar
(doc)
where ... 是你想传递给的参数bar
(doc)
If you want to re-bin your data see Histogram with separate list denoting frequency
如果您想重新装箱您的数据,请参阅带有单独列表的直方图,表示频率
回答by R. Yang
the length of the "bins" array should be longer than the length of "counts". Here's the way to fully reconstruct the histogram:
“bins”数组的长度应该比“counts”的长度长。这是完全重建直方图的方法:
import numpy as np
import matplotlib.pyplot as plt
bins = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).astype(float)
counts = np.array([5, 3, 4, 5, 6, 1, 3, 7]).astype(float)
centroids = (bins[1:] + bins[:-1]) / 2
counts_, bins_, _ = plt.hist(centroids, bins=len(counts),
weights=counts, range=(min(bins), max(bins)))
plt.show()
assert np.allclose(bins_, bins)
assert np.allclose(counts_, counts)
回答by youssef mhiri
You can also use seaborn to plot the histogram :
您还可以使用 seaborn 绘制直方图:
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(list(counted_data.keys()), hist_kws={"weights":list(counted_data.values())})