Python 从 Pandas DataFrame 中的分组数据绘制直方图

Question

提问by dreme

I need some guidance in working out how to plot a block of histograms from grouped data in a pandas dataframe. Here's an example to illustrate my question:

我需要一些指导来解决如何从 Pandas 数据框中的分组数据绘制直方图块。这是一个例子来说明我的问题：

from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
df = DataFrame({'Letter':x, 'N':y})
grouped = df.groupby('Letter')

In my ignorance I tried this code command:

在我的无知中，我尝试了以下代码命令：

df.groupby('Letter').hist()

which failed with the error message "TypeError: cannot concatenate 'str' and 'float' objects"

失败并显示错误消息“TypeError: cannot concatenate 'str' and 'float' objects”

Any help most appreciated.

任何帮助最受赞赏。

Answer 1

采纳答案by dreme

I'm on a roll, just found an even simpler way to do it using the bykeyword in the hist method:

我正在努力，刚刚找到了一种更简单的方法来使用hist 方法中的by关键字：

df['N'].hist(by=df['Letter'])

That's a very handy little shortcut for quickly scanning your grouped data!

这是快速扫描分组数据的一个非常方便的小快捷方式！

For future visitors, the product of this call is the following chart: enter image description here

对于未来的访问者，本次通话的产品如下图：在此处输入图片说明

Answer 2

回答by Paul

One solution is to use matplotlib histogram directly on each grouped data frame. You can loop through the groups obtained in a loop. Each group is a dataframe. And you can create a histogram for each one.

一种解决方案是直接在每个分组数据框上使用 matplotlib 直方图。您可以遍历在循环中获得的组。每个组都是一个数据框。您可以为每个人创建一个直方图。

from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
df = DataFrame({'Letter':x, 'N':y})
grouped = df.groupby('Letter')

for group in grouped:
  figure()
  matplotlib.pyplot.hist(group[1].N)
  show()

Answer 3

回答by cwharland

Your function is failing because the groupby dataframe you end up with has a hierarchical index and two columns (Letter and N) so when you do .hist()it's trying to make a histogram of both columns hence the str error.

你的函数失败了，因为你最终得到的 groupby 数据帧有一个分层索引和两列（字母和 N），所以当你这样做时，.hist()它试图制作两列的直方图，因此出现 str 错误。

This is the default behavior of pandas plotting functions (one plot per column) so if you reshape your data frame so that each letter is a column you will get exactly what you want.

这是 Pandas 绘图函数的默认行为（每列一个图），因此如果您重塑数据框，使每个字母都是一列，您将得到您想要的。

df.reset_index().pivot('index','Letter','N').hist()

The reset_index()is just to shove the current index into a column called index. Then pivotwill take your data frame, collect all of the values Nfor each Letterand make them a column. The resulting data frame as 400 rows (fills missing values with NaN) and three columns (A, B, C). hist()will then produce one histogram per column and you get format the plots as needed.

这reset_index()只是将当前索引推入名为index. 然后pivot将获取您的数据框，收集N每个值的所有值Letter并使它们成为一列。结果数据框为 400 行（用填充缺失值NaN）和三列 ( A, B, C)。 hist()然后将每列生成一个直方图，您可以根据需要格式化绘图。

Answer 4

回答by dirkjot

With recent version of Pandas, you can do df.N.hist(by=df.Letter)

使用最新版本的 Pandas，你可以做到 df.N.hist(by=df.Letter)

Just like with the solutions above, the axes will be different for each subplot. I have not solved that one yet.

就像上面的解决方案一样，每个子图的轴都会不同。我还没有解决那个问题。

Python 从 Pandas DataFrame 中的分组数据绘制直方图

提问by dreme

采纳答案by dreme

回答by Paul

回答by cwharland

回答by dirkjot

相关推荐

最近更新

标签

Python 从 Pandas DataFrame 中的分组数据绘制直方图

提问by dreme

采纳答案by dreme

回答by Paul

回答by cwharland

回答by dirkjot

相关推荐

Python UnicodeDecodeError: ('utf-8' codec) 读取 csv 文件时

requirements.txt 取决于 python 版本

如何在 Python 中按元素连接两个列表？

Python 使用 asyncio 逐行读取文件

相关推荐

最近更新

标签