Python 从 Pandas DataFrame 中的分组数据绘制直方图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19584029/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Plotting histograms from grouped data in a pandas DataFrame
提问by dreme
I need some guidance in working out how to plot a block of histograms from grouped data in a pandas dataframe. Here's an example to illustrate my question:
我需要一些指导来解决如何从 Pandas 数据框中的分组数据绘制直方图块。这是一个例子来说明我的问题:
from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
df = DataFrame({'Letter':x, 'N':y})
grouped = df.groupby('Letter')
In my ignorance I tried this code command:
在我的无知中,我尝试了以下代码命令:
df.groupby('Letter').hist()
which failed with the error message "TypeError: cannot concatenate 'str' and 'float' objects"
失败并显示错误消息“TypeError: cannot concatenate 'str' and 'float' objects”
Any help most appreciated.
任何帮助最受赞赏。
采纳答案by dreme
I'm on a roll, just found an even simpler way to do it using the bykeyword in the hist method:
我正在努力,刚刚找到了一种更简单的方法来使用hist 方法中的by关键字:
df['N'].hist(by=df['Letter'])
That's a very handy little shortcut for quickly scanning your grouped data!
这是快速扫描分组数据的一个非常方便的小快捷方式!
For future visitors, the product of this call is the following chart:
对于未来的访问者,本次通话的产品如下图:
回答by Paul
One solution is to use matplotlib histogram directly on each grouped data frame. You can loop through the groups obtained in a loop. Each group is a dataframe. And you can create a histogram for each one.
一种解决方案是直接在每个分组数据框上使用 matplotlib 直方图。您可以遍历在循环中获得的组。每个组都是一个数据框。您可以为每个人创建一个直方图。
from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
df = DataFrame({'Letter':x, 'N':y})
grouped = df.groupby('Letter')
for group in grouped:
figure()
matplotlib.pyplot.hist(group[1].N)
show()
回答by cwharland
Your function is failing because the groupby dataframe you end up with has a hierarchical index and two columns (Letter and N) so when you do .hist()
it's trying to make a histogram of both columns hence the str error.
你的函数失败了,因为你最终得到的 groupby 数据帧有一个分层索引和两列(字母和 N),所以当你这样做时,.hist()
它试图制作两列的直方图,因此出现 str 错误。
This is the default behavior of pandas plotting functions (one plot per column) so if you reshape your data frame so that each letter is a column you will get exactly what you want.
这是 Pandas 绘图函数的默认行为(每列一个图),因此如果您重塑数据框,使每个字母都是一列,您将得到您想要的。
df.reset_index().pivot('index','Letter','N').hist()
The reset_index()
is just to shove the current index into a column called index
. Then pivot
will take your data frame, collect all of the values N
for each Letter
and make them a column. The resulting data frame as 400 rows (fills missing values with NaN
) and three columns (A, B, C
). hist()
will then produce one histogram per column and you get format the plots as needed.
这reset_index()
只是将当前索引推入名为index
. 然后pivot
将获取您的数据框,收集N
每个值的所有值Letter
并使它们成为一列。结果数据框为 400 行(用 填充缺失值NaN
)和三列 ( A, B, C
)。 hist()
然后将每列生成一个直方图,您可以根据需要格式化绘图。
回答by dirkjot
With recent version of Pandas, you can do
df.N.hist(by=df.Letter)
使用最新版本的 Pandas,你可以做到
df.N.hist(by=df.Letter)
Just like with the solutions above, the axes will be different for each subplot. I have not solved that one yet.
就像上面的解决方案一样,每个子图的轴都会不同。我还没有解决那个问题。