Python Pandas 中的多个直方图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25539195/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Multiple histograms in Pandas
提问by Rohit
I would like to create the following histogram (see image below) taken from the book "Think Stats". However, I cannot get them on the same plot. Each DataFrame takes its own subplot.
我想创建以下取自“Think Stats”一书的直方图(见下图)。但是,我无法将它们放在同一个情节上。每个 DataFrame 都有自己的子图。
I have the following code:
我有以下代码:
import nsfg
import matplotlib.pyplot as plt
df = nsfg.ReadFemPreg()
preg = nsfg.ReadFemPreg()
live = preg[preg.outcome == 1]
first = live[live.birthord == 1]
others = live[live.birthord != 1]
#fig = plt.figure()
#ax1 = fig.add_subplot(111)
first.hist(column = 'prglngth', bins = 40, color = 'teal', \
alpha = 0.5)
others.hist(column = 'prglngth', bins = 40, color = 'blue', \
alpha = 0.5)
plt.show()
The above code does not work when I use ax = ax1 as suggested in: pandas multiple plots not working as histsnor this example does what I need: Overlaying multiple histograms using pandas. When I use the code as it is, it creates two windows with histograms. Any ideas how to combine them?
当我按照以下建议使用 ax = ax1 时,上面的代码不起作用:pandas multiple plots not working as hists或者这个例子做我需要的:Overlaying multiple histograms using pandas。当我按原样使用代码时,它会创建两个带有直方图的窗口。任何想法如何将它们结合起来?
Here's an example of how I'd like the final figure to look:

这是我希望最终数字的外观示例:

采纳答案by Paul H
As far as I can tell, pandas can't handle this situation. That's ok since all of their plotting methods are for convenience only. You'll need to use matplotlib directly. Here's how I do it:
据我所知,熊猫无法处理这种情况。没关系,因为他们所有的绘图方法都是为了方便起见。您需要直接使用 matplotlib。这是我的方法:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas
#import seaborn
#seaborn.set(style='ticks')
np.random.seed(0)
df = pandas.DataFrame(np.random.normal(size=(37,2)), columns=['A', 'B'])
fig, ax = plt.subplots()
a_heights, a_bins = np.histogram(df['A'])
b_heights, b_bins = np.histogram(df['B'], bins=a_bins)
width = (a_bins[1] - a_bins[0])/3
ax.bar(a_bins[:-1], a_heights, width=width, facecolor='cornflowerblue')
ax.bar(b_bins[:-1]+width, b_heights, width=width, facecolor='seagreen')
#seaborn.despine(ax=ax, offset=10)
And that gives me:

这给了我:

回答by blalterman
From the pandas website (http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization-hist):
从熊猫网站(http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization-hist):
df4 = pd.DataFrame({'a': np.random.randn(1000) + 1, 'b': np.random.randn(1000),
'c': np.random.randn(1000) - 1}, columns=['a', 'b', 'c'])
plt.figure();
df4.plot(kind='hist', alpha=0.5)
回答by sathyz
Here is the snippet, In my case I have explicitly specified bins and range as I didn't handle outlier removal as the author of the book.
这是片段,在我的例子中,我已经明确指定了 bins 和 range,因为我没有作为本书的作者处理异常值删除。
fig, ax = plt.subplots()
ax.hist([first.prglngth, others.prglngth], 10, (27, 50), histtype="bar", label=("First", "Other"))
ax.set_title("Histogram")
ax.legend()
Refer Matplotlib multihist plot with different sizes example.
请参阅具有不同大小示例的Matplotlib 多组图。
回答by lin_bug
In case anyone wants to plot one histogram over another (rather than alternating bars) you can simply call .hist()consecutively on the series you want to plot:
如果有人想在另一个直方图上绘制一个直方图(而不是交替条形图),您可以简单地.hist()在要绘制的系列上连续调用:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas
np.random.seed(0)
df = pandas.DataFrame(np.random.normal(size=(37,2)), columns=['A', 'B'])
df['A'].hist()
df['B'].hist()
This gives you:
这给你:
Note that the order you call .hist()matters (the first one will be at the back)
请注意,您调用的顺序很.hist()重要(第一个将在后面)
回答by Joshua Zastrow
You make two dataframes and one matplotlib axis
您制作了两个数据框和一个 matplotlib 轴
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'data1': np.random.randn(10),
'data2': np.random.randn(10)
})
df2 = df1.copy()
fig, ax = plt.subplots()
df1.hist(column=['data1'], ax=ax)
df2.hist(column=['data2'], ax=ax)

