如何在 Pandas 中创建 groupby 子图?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33150510/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:02:34  来源:igfitidea点击:

How to create groupby subplots in Pandas?

pythonpandasmatplotlibseaborn

提问by elksie5000

I've got a dataframe with timeseries data of crime with a facet on offence (which looks like the format below). What I'd like to perform a groupby plot on the dataframe so that it's possible to explore trends in crime over time.

我有一个包含犯罪时间序列数据的数据框,其中包含进攻方面的数据(类似于下面的格式)。我想在数据框上执行分组图,以便可以探索犯罪随时间的变化趋势。

    Offence                     Rolling year total number of offences       Month
0   Criminal damage and arson   1001                                        2003-03-31
1   Drug offences               66                                         2003-03-31
2   All other theft offences    617                                   2003-03-31
3   Bicycle theft               92                                    2003-03-31
4   Domestic burglary           282                                   2003-03-31

I've got some code which does the job, but it's a bit clumsy and it loses the time series formatting that Pandas delivers on a single plot. (I've included an image to illustrate). Can anyone suggest an idiom for such plots that I can use?

我有一些代码可以完成这项工作,但它有点笨拙,并且失去了 Pandas 在单个绘图上提供的时间序列格式。(我已经包含了一张图片来说明)。任何人都可以为我可以使用的这些情节建议一个成语吗?

I would turn to Seaborn but I can't work out how to format the xlabel as timeseries.

我会求助于 Seaborn,但我不知道如何将 xlabel 格式化为时间序列。

[![subs = \[\]
for idx, (i, g) in enumerate(df.groupby("Offence")):
        subs.append({"data": g.set_index("Month").resample("QS-APR", how="sum" ).ix\["2010":\],
                     "title":i})

ax = plt.figure(figsize=(25,15))
for i,g in enumerate(subs):
    plt.subplot(5, 5, i)
    plt.plot(g\['data'\])
    plt.title(g\['title'\])
    plt.xlabel("Time")
    plt.ylabel("No. of crimes")
    plt.tight_layout()][1]][1]

回答by Sergey Bushmanov

This is a reproducible example of 6 scatterplots in Pandas, obtained from pd.groupby()for 6 consecutive years. On x axis -- there is oil price (brent) for the year, on y -- the value for sp500 for the same year.

这是 Pandas 中pd.groupby()连续 6 年获得的 6 个散点图的可重现示例。在 x 轴上 - 有当年的石油价格(布伦特),在 y 上 - 同年 sp500 的值。

import matplotlib.pyplot as plt
import pandas as pd
import Quandl as ql
%matplotlib inline

brent = ql.get('FRED/DCOILBRENTEU')
sp500 = ql.get('YAHOO/INDEX_GSPC')
values = pd.DataFrame({'brent':brent.VALUE, 'sp500':sp500.Close}).dropna()["2009":"2015"]

fig, axes = plt.subplots(2,3, figsize=(15,5))
for (year, group), ax in zip(values.groupby(values.index.year), axes.flatten()):
    group.plot(x='brent', y='sp500', kind='scatter', ax=ax, title=year)

This produces the below plot:

这会产生以下图:

enter image description here

在此处输入图片说明

(Just in case, from these plots you may infer there was a strong correlation between oil and sp500 in 2010 but not in other years).

(以防万一,从这些图中您可以推断出石油和 sp500 之间在 2010 年有很强的相关性,但在其他年份则不然)。

You may change kindin group.plot()so that it suits your specific kind or data. My anticipation, pandas will preserve the date formatting for x-axis if you have it in your data.

您可以更改kindgroup.plot()使其适合您的特定类型或数据。我的预期是,如果您的数据中有 x 轴的日期格式,pandas 将保留它。

回答by Nipun Batra

Altaircan work great in such cases.

Altair在这种情况下可以很好地工作。

import matplotlib.pyplot as plt
import pandas as pd
import quandl as ql

df = ql.get(["NSE/OIL.1", "WIKI/AAPL.1"], start_date="2013-1-1")
df.columns = ['OIL', 'AAPL']
df['year'] = df.index.year

from altair import *

Viz #1- No color by year/No columns by year

可视化 #1- 按年份没有颜色/按年份没有列

Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL').configure_cell(width=200, height=150)

enter image description here

在此处输入图片说明

Viz #2- No color by year/columns by year

可视化 #2- 没有按年份/按年份列的颜色

Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL', column='year').configure_cell(width=140, height=70).configure_facet_cell(strokeWidth=0)

enter image description here

在此处输入图片说明

Viz #3- Color by year

可视化 #3- 按年份着色

Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL', color='year:N').configure_cell(width=140, height=70)

enter image description here

在此处输入图片说明