Python Matplotlib xticks 与直方图不对齐
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27083051/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Matplotlib xticks not lining up with histogram
提问by Paymahn Moghadasian
I'm generating some histograms with matplotlib and I'm having some trouble figuring out how to get the xticks of a histogram to align with the bars.
我正在使用 matplotlib 生成一些直方图,但在弄清楚如何让直方图的 xticks 与条形对齐时遇到了一些麻烦。
Here's a sample of the code I use to generate the histogram:
这是我用来生成直方图的代码示例:
from matplotlib import pyplot as py
py.hist(histogram_data, 49, alpha=0.75)
py.title(column_name)
py.xticks(range(49))
py.show()
I know that all of values in the histogram_dataarray are in [0,1,...,48]. Which, assuming I did the math right, means there are 49 unique values. I'd like to show a histogram of each of those values. Here's a picture of what's generated.
我知道histogram_data数组中的所有值都在[0,1,...,48]. 假设我的数学计算正确,这意味着有 49 个唯一值。我想显示每个值的直方图。这是生成的图片。


How can I set up the graph such that all of the xticks are aligned to the left, middle or right of each of the bars?
如何设置图表,使所有 xticks 都与每个条形的左侧、中间或右侧对齐?
采纳答案by Joe Kington
Short answer:Use plt.hist(data, bins=range(50))instead to get left-aligned bins, plt.hist(data, bins=np.arange(50)-0.5)to get center-aligned bins, etc.
简短回答:plt.hist(data, bins=range(50))改为使用获得左对齐的 bin、plt.hist(data, bins=np.arange(50)-0.5)获得居中对齐的 bin 等。
Also, if performance matters, because you want counts of unique integers, there are a couple of slightly more efficient methods (np.bincount) that I'll show at the end.
此外,如果性能很重要,因为您需要唯一整数的计数np.bincount,我将在最后展示一些更高效的方法 ( )。
Problem Statement
问题陈述
As a stand-alone example of what you're seeing, consider the following:
作为您所看到的独立示例,请考虑以下内容:
import matplotlib.pyplot as plt
import numpy as np
# Generate a random array of integers between 0-9
# data.min() will be 0 and data.max() will be 9 (not 10)
data = np.random.randint(0, 10, 1000)
plt.hist(data, bins=10)
plt.xticks(range(10))
plt.show()


As you've noticed, the bins aren't aligned with integer intervals. This is basically because you asked for 10 bins between0 and 9, which isn't quite the same as asking for bins for the 10 unique values.
正如您所注意到的,bin 没有与整数间隔对齐。这基本上是因为您要求0 到 9之间的10 个 bin ,这与要求 10 个唯一值的 bin 不太一样。
The number of bins you want isn't exactly the same as the number of unique values. What you actually should do in this case is manually specify the bin edges.
您想要的 bin 数量与唯一值的数量不完全相同。在这种情况下,您实际上应该做的是手动指定 bin 边缘。
To explain what's going on, let's skip matplotlib.pyplot.histand just use the underlying numpy.histogramfunction.
为了解释发生了什么,让我们跳过matplotlib.pyplot.hist并使用底层numpy.histogram函数。
For example, let's say you have the values [0, 1, 2, 3]. Your first instinct would be to do:
例如,假设您有 values [0, 1, 2, 3]。你的第一反应是:
In [1]: import numpy as np
In [2]: np.histogram([0, 1, 2, 3], bins=4)
Out[2]: (array([1, 1, 1, 1]), array([ 0. , 0.75, 1.5 , 2.25, 3. ]))
The first array returned is the counts and the second is the bin edges (in other words, where bar edges would be in your plot).
返回的第一个数组是计数,第二个数组是 bin 边缘(换句话说,条形边缘将在您的图中)。
Notice that we get the counts we'd expect, but because we asked for 4 bins between the min and max of the data, the bin edges aren't on integer values.
请注意,我们得到了预期的计数,但是因为我们要求在数据的最小值和最大值之间有 4 个 bin,所以 bin 边缘不是整数值。
Next, you might try:
接下来,您可以尝试:
In [3]: np.histogram([0, 1, 2, 3], bins=3)
Out[3]: (array([1, 1, 2]), array([ 0., 1., 2., 3.]))
Note that the bin edges (the second array) are what you were expecting, but the counts aren't. That's because the last bin behaves differently than the others, as noted in the documentation for numpy.histogram:
请注意,bin 边缘(第二个数组)是您所期望的,但计数不是。这是因为最后一个 bin 的行为与其他 bin 不同,如文档中所述numpy.histogram:
Notes
-----
All but the last (righthand-most) bin is half-open. In other words, if
`bins` is::
[1, 2, 3, 4]
then the first bin is ``[1, 2)`` (including 1, but excluding 2) and the
second ``[2, 3)``. The last bin, however, is ``[3, 4]``, which *includes*
4.
Therefore, what you actually should do is specify exactly what bin edges you want, and either include one beyond your last data point or shift the bin edges to the 0.5intervals. For example:
因此,您实际上应该做的是准确指定您想要的 bin 边缘,或者包括超出最后一个数据点的边界,或者将 bin 边缘移动到0.5间隔。例如:
In [4]: np.histogram([0, 1, 2, 3], bins=range(5))
Out[4]: (array([1, 1, 1, 1]), array([0, 1, 2, 3, 4]))
Bin Alignment
箱对齐
Now let's apply this to the first example and see what it looks like:
现在让我们将其应用于第一个示例,看看它是什么样子:
import matplotlib.pyplot as plt
import numpy as np
# Generate a random array of integers between 0-9
# data.min() will be 0 and data.max() will be 9 (not 10)
data = np.random.randint(0, 10, 1000)
plt.hist(data, bins=range(11)) # <- The only difference
plt.xticks(range(10))
plt.show()


Okay, great! However, we now effectively have left-aligned bins. What if we wanted center-aligned bins to better reflect the fact that these are unique values?
好的,太好了!然而,我们现在有效地拥有左对齐的 bin。如果我们希望居中对齐的 bin 更好地反映这些是唯一值的事实,该怎么办?
The quick way is to just shift the bin edges:
快速的方法是移动 bin 边缘:
import matplotlib.pyplot as plt
import numpy as np
# Generate a random array of integers between 0-9
# data.min() will be 0 and data.max() will be 9 (not 10)
data = np.random.randint(0, 10, 1000)
bins = np.arange(11) - 0.5
plt.hist(data, bins)
plt.xticks(range(10))
plt.xlim([-1, 10])
plt.show()


Similarly for right-aligned bins, just shift by -1.
同样,对于右对齐的 bin,只需移动-1.
Another approach
另一种方法
For the particular case of unique integer values, there's another, more efficient approach we can take.
对于唯一整数值的特殊情况,我们可以采用另一种更有效的方法。
If you're dealing with unique integer counts starting with 0, you're better off using numpy.bincountthan using numpy.hist.
如果您要处理从 0 开始的唯一整数计数,则最好numpy.bincount使用numpy.hist.
For example:
例如:
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randint(0, 10, 1000)
counts = np.bincount(data)
# Switching to the OO-interface. You can do all of this with "plt" as well.
fig, ax = plt.subplots()
ax.bar(range(10), counts, width=1, align='center')
ax.set(xticks=range(10), xlim=[-1, 10])
plt.show()


There are two big advantages to this approach. One is speed. numpy.histogram(and therefore plt.hist) basically runs the data through numpy.digitizeand then numpy.bincount. Because you're dealing with unique integer values, there's no need to take the numpy.digitizestep.
这种方法有两大优点。一是速度。 numpy.histogram(因此plt.hist)基本上运行数据numpy.digitize,然后numpy.bincount。因为您正在处理唯一的整数值,所以没有必要采取这numpy.digitize一步。
However, the bigger advantage is more control over display. If you'd prefer thinner rectangles, just use a smaller width:
然而,更大的优势是对显示的更多控制。如果您更喜欢更薄的矩形,只需使用更小的宽度:
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randint(0, 10, 1000)
counts = np.bincount(data)
# Switching to the OO-interface. You can do all of this with "plt" as well.
fig, ax = plt.subplots()
ax.bar(range(10), counts, width=0.8, align='center')
ax.set(xticks=range(10), xlim=[-1, 10])
plt.show()


回答by int
Using the OO interface to configure ticks has the advantage of centering the labels while preserving the xticks. Also, it works with any plotting function and doesn't depend on np.bincount()or ax.bar()
使用 OO 接口配置刻度的优点是将标签居中,同时保留 xticks。此外,它适用于任何绘图功能,不依赖于np.bincount()或ax.bar()
import matplotlib.ticker as tkr
data = np.random.randint(0, 10, 1000)
mybins = range(11)
fig, ax = subplots()
ax.hist(data, bins=mybins, rwidth=0.8)
ax.set_xticks(mybins)
ax.xaxis.set_minor_locator(tkr.AutoMinorLocator(n=2))
ax.xaxis.set_minor_formatter(tkr.FixedFormatter(mybins))
ax.xaxis.set_major_formatter(tkr.NullFormatter())
for tick in ax.xaxis.get_minor_ticks():
tick.tick1line.set_markersize(0)

(source: pbrd.co)

(来源:pbrd.co)
回答by Geek Actualizado
If comment bins.append(sorted(set(labels))[-1]):
如果评论bins.append(sorted(set(labels))[-1]):
bins = [i_bin - 0.5 for i_bin in set(labels)]
# bins.append(sorted(set(labels))[-1])
plt.hist(labels, bins)
plt.show()
If not:
如果不:
bins = [i_bin - 0.5 for i_bin in set(labels)]
bins.append(sorted(set(labels))[-1])
plt.hist(labels, bins)
plt.show()

