Python 从 Matplotlib 中预先计算的数据绘制直方图

Question

提问by Josh Rosen

I'd like to use Matplotlib to plot a histogram over data that's been pre-counted. For example, say I have the raw data

我想使用 Matplotlib 在预先计算的数据上绘制直方图。例如，假设我有原始数据

data = [1, 2, 2, 3, 4, 5, 5, 5, 5, 6, 10]

Given this data, I can use

鉴于这些数据，我可以使用

pylab.hist(data, bins=[...])

to plot a histogram.

绘制直方图。

In my case, the data has been pre-counted and is represented as a dictionary:

就我而言，数据已预先计算并表示为字典：

counted_data = {1: 1, 2: 2, 3: 1, 4: 1, 5: 4, 6: 1, 10: 1}

Ideally, I'd like to pass this pre-counted data to a histogram function that lets me control the bin widths, plot range, etc, as if I had passed it the raw data. As a workaround, I'm expanding my counts into the raw data:

理想情况下，我想将这个预先计算的数据传递给一个直方图函数，让我可以控制 bin 宽度、绘图范围等，就好像我已经将原始数据传递给它一样。作为一种解决方法，我将我的计数扩展到原始数据中：

data = list(chain.from_iterable(repeat(value, count)
            for (value, count) in counted_data.iteritems()))

This is inefficient when counted_datacontains counts for millions of data points.

当counted_data包含数百万个数据点的计数时，这是低效的。

Is there an easier way to use Matplotlib to produce a histogram from my pre-counted data?

有没有更简单的方法可以使用 Matplotlib 从我的预先计算的数据中生成直方图？

Alternatively, if it's easiest to just bar-plot data that's been pre-binned, is there a convenience method to "roll-up" my per-item counts into binned counts?

或者，如果仅对预先分箱的条形图数据最简单，是否有一种方便的方法可以将我的每项计数“汇总”为分箱计数？

Answer 1

采纳答案by Josh Rosen

I used pyplot.hist's weightsoption to weight each key by its value, producing the histogram that I wanted:

我使用pyplot.hist的weights选项按每个键的值加权，生成我想要的直方图：

pylab.hist(counted_data.keys(), weights=counted_data.values(), bins=range(50))

This allows me to rely on histto re-bin my data.

这让我可以依靠hist重新装箱我的数据。

Answer 2

回答by tacaswell

You can use the weightskeyword argument to np.histgram(which plt.histcalls underneath)

您可以使用weights关键字参数 to np.histgram（plt.hist在下面调用）

val, weight = zip(*[(k, v) for k,v in counted_data.items()])
plt.hist(val, weights=weight)

Assuming you onlyhave integers as the keys, you can also use bardirectly:

假设你只有整数作为键，你也可以bar直接使用：

min_bin = np.min(counted_data.keys())
max_bin = np.max(counted_data.keys())

bins = np.arange(min_bin, max_bin + 1)
vals = np.zeros(max_bin - min_bin + 1)

for k,v in counted_data.items():
    vals[k - min_bin] = v

plt.bar(bins, vals, ...)

where ... is what ever arguments you want to pass to bar(doc)

where ... 是你想传递给的参数bar（doc）

If you want to re-bin your data see Histogram with separate list denoting frequency

如果您想重新装箱您的数据，请参阅带有单独列表的直方图，表示频率

Answer 3

回答by R. Yang

the length of the "bins" array should be longer than the length of "counts". Here's the way to fully reconstruct the histogram:

“bins”数组的长度应该比“counts”的长度长。这是完全重建直方图的方法：

import numpy as np
import matplotlib.pyplot as plt
bins = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).astype(float)
counts = np.array([5, 3, 4, 5, 6, 1, 3, 7]).astype(float)
centroids = (bins[1:] + bins[:-1]) / 2
counts_, bins_, _ = plt.hist(centroids, bins=len(counts),
                             weights=counts, range=(min(bins), max(bins)))
plt.show()
assert np.allclose(bins_, bins)
assert np.allclose(counts_, counts)

Answer 4

回答by youssef mhiri

You can also use seaborn to plot the histogram :

您还可以使用 seaborn 绘制直方图：

import matplotlib.pyplot as plt
import seaborn as sns

sns.distplot(list(counted_data.keys()), hist_kws={"weights":list(counted_data.values())})

Python 从 Matplotlib 中预先计算的数据绘制直方图

提问by Josh Rosen

采纳答案by Josh Rosen

回答by tacaswell

回答by R. Yang

回答by youssef mhiri

相关推荐

最近更新

标签

Python 从 Matplotlib 中预先计算的数据绘制直方图

提问by Josh Rosen

采纳答案by Josh Rosen

回答by tacaswell

回答by R. Yang

回答by youssef mhiri

相关推荐

Python AttributeError: 'module' 对象没有属性 'utcnow'

Python 如何使用模块re从数据帧列中删除特殊字符？

Python Scikit Learn - K-Means - Elbow - 标准

Python 如何使用 Django Migrations 重新创建已删除的表？

相关推荐

最近更新

标签