pandas 熊猫图直方图数据框索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27157522/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:43:05  来源:igfitidea点击:

pandas plot histogram data frame index

matplotlibpandasplot

提问by DigitalMusicology

I have the following data frame (df) in pandas:

我在Pandas中有以下数据框(df):

       NetPrice  Units  Royalty
Price                       
3.65    9.13    171    57.60
3.69    9.23     13     4.54
3.70    9.25    129    43.95
3.80    9.49    122    42.76
3.90    9.74    105    38.30
3.94    9.86    158    57.35
3.98    9.95     37    13.45
4.17   10.42     69    27.32
4.82   12.04    176    77.93
4.84   24.22    132    59.02
5.16   12.91    128    60.81
5.22   13.05    129    62.00

I am trying to create a histogram on the index ("Price) with an y-axis of "Units" . I started with the following:

我正在尝试在 y 轴为 "Units" 的索引 ("Price) 上创建直方图。我从以下内容开始:

plt.hist(df.index)

This gives me a histogram plotting the price. How can I add the Units to the y-axis? Right now it is just a "scale".

这给了我一个绘制价格的直方图。如何将单位添加到 y 轴?现在它只是一个“规模”。

Thank you!

谢谢!

回答by JD Long

Because your data is already partially aggregated, you can't use the hist()methods directly. Like @snorthway said in the comments, you can do this with a bar chart. Only you need to put your data in buckets first. My favorite way to put data in buckets is with the pandas cut()method.

因为您的数据已经部分聚合,所以您不能hist()直接使用这些方法。就像@snorthway 在评论中所说的那样,您可以使用条形图来做到这一点。只有您需要先将数据放入存储桶中。我最喜欢的将数据放入桶的cut()方法是使用 pandas方法。

Let's set up some example data since you didn't provide some that's easy to use:

让我们设置一些示例数据,因为您没有提供一些易于使用的数据:

np.random.seed(1)
n = 1000
df = pd.DataFrame({'Price' : np.random.normal(5,2,size=n),
                   'Units' : np.random.randint(100, size=n)})

Let's put the prices into 10 evenly spaced buckets:

让我们将价格放入 10 个均匀分布的桶中:

df['bucket'] = pd.cut(df.Price, 10)
print df.head()

      Price  Units           bucket
0  8.248691     98    (7.307, 8.71]
1  3.776487      8  (3.0999, 4.502]
2  3.943656     89  (3.0999, 4.502]
3  2.854063     27  (1.697, 3.0999]
4  6.730815     29   (5.905, 7.307]

So now we have a field that contains the bucket range. If you want to give those buckets other names, you can read about that in the excellent Pandas documentation. Now we can use the Pandas groupby()method and sum()to add up the units:

所以现在我们有一个包含存储桶范围的字段。如果您想为这些存储桶赋予其他名称,您可以在优秀的Pandas 文档 中阅读相关内容。现在我们可以使用 Pandasgroupby()方法并将sum()单位相加:

newdf = df[['bucket','Units']].groupby('bucket').sum()
print newdf
                  Units
bucket                 
(-1.122, 0.295]     492
(0.295, 1.697]     1663
(1.697, 3.0999]    5003
(3.0999, 4.502]   11084
(4.502, 5.905]    15144
(5.905, 7.307]    11053
(7.307, 8.71]      4424
(8.71, 10.112]     1008
(10.112, 11.515]     77
(11.515, 12.917]    122

That looks like a winner... now let's plot it:

这看起来像一个赢家......现在让我们绘制它:

 newdf.plot(kind='bar')

enter image description here

在此处输入图片说明