Python Pandas 中的多个直方图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25539195/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 20:21:44  来源:igfitidea点击:

Multiple histograms in Pandas

pythonmatplotlibpandashistogram

提问by Rohit

I would like to create the following histogram (see image below) taken from the book "Think Stats". However, I cannot get them on the same plot. Each DataFrame takes its own subplot.

我想创建以下取自“Think Stats”一书的直方图(见下图)。但是,我无法将它们放在同一个情节上。每个 DataFrame 都有自己的子图。

I have the following code:

我有以下代码:

import nsfg
import matplotlib.pyplot as plt
df = nsfg.ReadFemPreg()
preg = nsfg.ReadFemPreg()
live = preg[preg.outcome == 1]

first = live[live.birthord == 1]
others = live[live.birthord != 1]

#fig = plt.figure()
#ax1 = fig.add_subplot(111)

first.hist(column = 'prglngth', bins = 40, color = 'teal', \
           alpha = 0.5)
others.hist(column = 'prglngth', bins = 40, color = 'blue', \
            alpha = 0.5)
plt.show()

The above code does not work when I use ax = ax1 as suggested in: pandas multiple plots not working as histsnor this example does what I need: Overlaying multiple histograms using pandas. When I use the code as it is, it creates two windows with histograms. Any ideas how to combine them?

当我按照以下建议使用 ax = ax1 时,上面的代码不起作用:pandas multiple plots not working as hists或者这个例子做我需要的:Overlaying multiple histograms using pandas。当我按原样使用代码时,它会创建两个带有直方图的窗口。任何想法如何将它们结合起来?

Here's an example of how I'd like the final figure to look: enter image description here

这是我希望最终数字的外观示例: 在此处输入图片说明

采纳答案by Paul H

As far as I can tell, pandas can't handle this situation. That's ok since all of their plotting methods are for convenience only. You'll need to use matplotlib directly. Here's how I do it:

据我所知,熊猫无法处理这种情况。没关系,因为他们所有的绘图方法都是为了方便起见。您需要直接使用 matplotlib。这是我的方法:

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas
#import seaborn
#seaborn.set(style='ticks')

np.random.seed(0)
df = pandas.DataFrame(np.random.normal(size=(37,2)), columns=['A', 'B'])
fig, ax = plt.subplots()

a_heights, a_bins = np.histogram(df['A'])
b_heights, b_bins = np.histogram(df['B'], bins=a_bins)

width = (a_bins[1] - a_bins[0])/3

ax.bar(a_bins[:-1], a_heights, width=width, facecolor='cornflowerblue')
ax.bar(b_bins[:-1]+width, b_heights, width=width, facecolor='seagreen')
#seaborn.despine(ax=ax, offset=10)

And that gives me: enter image description here

这给了我: 在此处输入图片说明

回答by blalterman

From the pandas website (http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization-hist):

从熊猫网站(http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization-hist):

df4 = pd.DataFrame({'a': np.random.randn(1000) + 1, 'b': np.random.randn(1000),
                    'c': np.random.randn(1000) - 1}, columns=['a', 'b', 'c'])

plt.figure();

df4.plot(kind='hist', alpha=0.5)

回答by sathyz

Here is the snippet, In my case I have explicitly specified bins and range as I didn't handle outlier removal as the author of the book.

这是片段,在我的例子中,我已经明确指定了 bins 和 range,因为我没有作为本书的作者处理异常值删除。

fig, ax = plt.subplots()
ax.hist([first.prglngth, others.prglngth], 10, (27, 50), histtype="bar", label=("First", "Other"))
ax.set_title("Histogram")
ax.legend()

Refer Matplotlib multihist plot with different sizes example.

请参阅具有不同大小示例的Matplotlib 多组图。

回答by lin_bug

In case anyone wants to plot one histogram over another (rather than alternating bars) you can simply call .hist()consecutively on the series you want to plot:

如果有人想在另一个直方图上绘制一个直方图(而不是交替条形图),您可以简单地.hist()在要绘制的系列上连续调用:

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas


np.random.seed(0)
df = pandas.DataFrame(np.random.normal(size=(37,2)), columns=['A', 'B'])

df['A'].hist()
df['B'].hist()

This gives you:

这给你:

enter image description here

在此处输入图片说明

Note that the order you call .hist()matters (the first one will be at the back)

请注意,您调用的顺序很.hist()重要(第一个将在后面)

回答by Joshua Zastrow

You make two dataframes and one matplotlib axis

您制作了两个数据框和一个 matplotlib 轴

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

df1 = pd.DataFrame({
    'data1': np.random.randn(10),
    'data2': np.random.randn(10)
})

df2 = df1.copy()

fig, ax = plt.subplots()
df1.hist(column=['data1'], ax=ax)
df2.hist(column=['data2'], ax=ax)