Python 具有分箱范围的熊猫条形图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43005462/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:24:59  来源:igfitidea点击:

Pandas bar plot with binned range

pythonpandashistogrambar-chart

提问by Arnold Klein

Is there a way to create a bar plot from continuous data binned into predefined intervals? For example,

有没有办法从合并到预定义区间的连续数据创建条形图?例如,

In[1]: df
Out[1]: 
0      0.729630
1      0.699620
2      0.710526
3      0.000000
4      0.831325
5      0.945312
6      0.665428
7      0.871845
8      0.848148
9      0.262500
10     0.694030
11     0.503759
12     0.985437
13     0.576271
14     0.819742
15     0.957627
16     0.814394
17     0.944649
18     0.911111
19     0.113333
20     0.585821
21     0.930131
22     0.347222
23     0.000000
24     0.987805
25     0.950570
26     0.341317
27     0.192771
28     0.320988
29     0.513834

231    0.342541
232    0.866279
233    0.900000
234    0.615385
235    0.880597
236    0.620690
237    0.984375
238    0.171429
239    0.792683
240    0.344828
241    0.288889
242    0.961686
243    0.094402
244    0.960526
245    1.000000
246    0.166667
247    0.373494
248    0.000000
249    0.839416
250    0.862745
251    0.589873
252    0.983871
253    0.751938
254    0.000000
255    0.594937
256    0.259615
257    0.459916
258    0.935065
259    0.969231
260    0.755814

and instead of a simple histogram:

而不是简单的直方图:

df.hist()

usual histogram of df

df 的常用直方图

I need to create a bar plot, where each bar will count a number of instances within a predefined range. For example, the following plot should have three bars with the number of points which fall into: [0 0.35], [0.35 0.7] [0.7 1.0]

我需要创建一个条形图,其中每个条形将计算预定义范围内的多个实例。例如,下面的图应该有三个条形,点数属于:[0 0.35], [0.35 0.7] [0.7 1.0]

EDIT

编辑

Many thanks for your answers. Another question, how to order bins? For example, I get the following result:

非常感谢您的回答。另一个问题,如何订购垃圾箱?例如,我得到以下结果:

In[349]: out.value_counts()
Out[349]:  
[0, 0.001]      104
(0.001, 0.1]     61
(0.1, 0.2]       32
(0.2, 0.3]       20
(0.3, 0.4]       18
(0.7, 0.8]        6
(0.4, 0.5]        6
(0.5, 0.6]        5
(0.6, 0.7]        4
(0.9, 1]          3
(0.8, 0.9]        2
(1, 1.001]        0

as you can see, the last three bins are not ordered. How to sort the data frame based on 'categories' or my bins?

如您所见,最后三个 bin 没有排序。如何根据“类别”或我的垃圾箱对数据框进行排序?

EDIT 2

编辑 2

Just found how to solve it, simply with 'reindex()':

刚刚找到如何解决它,只需使用'reindex()':

In[355]: out.value_counts().reindex(out.cat.categories)
Out[355]: 
[0, 0.001]      104
(0.001, 0.1]     61
(0.1, 0.2]       32
(0.2, 0.3]       20
(0.3, 0.4]       18
(0.4, 0.5]        6
(0.5, 0.6]        5
(0.6, 0.7]        4
(0.7, 0.8]        6
(0.8, 0.9]        2
(0.9, 1]          3
(1, 1.001]        0

回答by Nickil Maveli

You can make use of pd.cutto partition the values into bins corresponding to each interval and then take each interval's total counts using pd.value_counts. Plot a bar graph later, additionally replace the X-axis tick labels with the category name to which that particular tick belongs.

您可以使用pd.cut将值划分为对应于每个间隔的箱,然后使用 获取每个间隔的总计数pd.value_counts。稍后绘制条形图,另外将 X 轴刻度标签替换为该特定刻度所属的类别名称。

out = pd.cut(s, bins=[0, 0.35, 0.7, 1], include_lowest=True)
ax = out.value_counts(sort=False).plot.bar(rot=0, color="b", figsize=(6,4))
ax.set_xticklabels([c[1:-1].replace(","," to") for c in out.cat.categories])
plt.show()

enter image description here

在此处输入图片说明



If you want the Y-axis to be displayed as relative percentages, normalize the frequency counts and multiply that result with 100.

如果您希望 Y 轴显示为相对百分比,请将频率计数归一化并将该结果乘以 100。

out = pd.cut(s, bins=[0, 0.35, 0.7, 1], include_lowest=True)
out_norm = out.value_counts(sort=False, normalize=True).mul(100)
ax = out_norm.plot.bar(rot=0, color="b", figsize=(6,4))
ax.set_xticklabels([c[1:-1].replace(","," to") for c in out.cat.categories])
plt.ylabel("pct")
plt.show()

enter image description here

在此处输入图片说明

回答by ImportanceOfBeingErnest

You may consider using matplotlib to plot the histogram. Unlike pandas' histfunction, matplotlib.pyplot.histaccepts an array as input for the bins.

您可以考虑使用 matplotlib 绘制直方图。与 pandas 的hist函数不同,它matplotlib.pyplot.hist接受一个数组作为 bin 的输入。

import numpy as np; np.random.seed(0)
import matplotlib.pyplot as plt
import pandas as pd

x = np.random.rand(120)
df = pd.DataFrame({"x":x})

bins= [0,0.35,0.7,1]
plt.hist(df.values, bins=bins, edgecolor="k")
plt.xticks(bins)

plt.show()

enter image description here

在此处输入图片说明

回答by Vaishali

You can use pd.cut

您可以使用 pd.cut

bins = [0,0.35,0.7,1]
df = df.groupby(pd.cut(df['val'], bins=bins)).val.count()
df.plot(kind='bar')

enter image description here

在此处输入图片说明