pandas 熊猫图直方图数据框索引

Question

提问by DigitalMusicology

I have the following data frame (df) in pandas:

我在Pandas中有以下数据框（df）：

       NetPrice  Units  Royalty
Price                       
3.65    9.13    171    57.60
3.69    9.23     13     4.54
3.70    9.25    129    43.95
3.80    9.49    122    42.76
3.90    9.74    105    38.30
3.94    9.86    158    57.35
3.98    9.95     37    13.45
4.17   10.42     69    27.32
4.82   12.04    176    77.93
4.84   24.22    132    59.02
5.16   12.91    128    60.81
5.22   13.05    129    62.00

I am trying to create a histogram on the index ("Price) with an y-axis of "Units" . I started with the following:

我正在尝试在 y 轴为 "Units" 的索引 ("Price) 上创建直方图。我从以下内容开始：

plt.hist(df.index)

This gives me a histogram plotting the price. How can I add the Units to the y-axis? Right now it is just a "scale".

这给了我一个绘制价格的直方图。如何将单位添加到 y 轴？现在它只是一个“规模”。

Thank you!

谢谢！

Answer 1

回答by JD Long

Because your data is already partially aggregated, you can't use the hist()methods directly. Like @snorthway said in the comments, you can do this with a bar chart. Only you need to put your data in buckets first. My favorite way to put data in buckets is with the pandas cut()method.

因为您的数据已经部分聚合，所以您不能hist()直接使用这些方法。就像@snorthway 在评论中所说的那样，您可以使用条形图来做到这一点。只有您需要先将数据放入存储桶中。我最喜欢的将数据放入桶的cut()方法是使用 pandas方法。

Let's set up some example data since you didn't provide some that's easy to use:

让我们设置一些示例数据，因为您没有提供一些易于使用的数据：

np.random.seed(1)
n = 1000
df = pd.DataFrame({'Price' : np.random.normal(5,2,size=n),
                   'Units' : np.random.randint(100, size=n)})

Let's put the prices into 10 evenly spaced buckets:

让我们将价格放入 10 个均匀分布的桶中：

df['bucket'] = pd.cut(df.Price, 10)
print df.head()

      Price  Units           bucket
0  8.248691     98    (7.307, 8.71]
1  3.776487      8  (3.0999, 4.502]
2  3.943656     89  (3.0999, 4.502]
3  2.854063     27  (1.697, 3.0999]
4  6.730815     29   (5.905, 7.307]

So now we have a field that contains the bucket range. If you want to give those buckets other names, you can read about that in the excellent Pandas documentation. Now we can use the Pandas groupby()method and sum()to add up the units:

所以现在我们有一个包含存储桶范围的字段。如果您想为这些存储桶赋予其他名称，您可以在优秀的Pandas 文档中阅读相关内容。现在我们可以使用 Pandasgroupby()方法并将sum()单位相加：

newdf = df[['bucket','Units']].groupby('bucket').sum()
print newdf
                  Units
bucket                 
(-1.122, 0.295]     492
(0.295, 1.697]     1663
(1.697, 3.0999]    5003
(3.0999, 4.502]   11084
(4.502, 5.905]    15144
(5.905, 7.307]    11053
(7.307, 8.71]      4424
(8.71, 10.112]     1008
(10.112, 11.515]     77
(11.515, 12.917]    122

That looks like a winner... now let's plot it:

这看起来像一个赢家......现在让我们绘制它：

 newdf.plot(kind='bar')

enter image description here

在此处输入图片说明

pandas 熊猫图直方图数据框索引

提问by DigitalMusicology

回答by JD Long

相关推荐

最近更新

标签

pandas 熊猫图直方图数据框索引

提问by DigitalMusicology

回答by JD Long

相关推荐

pandas 如何使用pandas在时间序列中查找连续的相同数据

Python Pandas 根据列的最大值删除列

Pandas：在同一个图上绘制两个直方图

python pandas TypeError：无法比较“Timestamp”类型和“float”类型

相关推荐

最近更新

标签