如何在 Pandas 中创建 groupby 子图?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33150510/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to create groupby subplots in Pandas?
提问by elksie5000
I've got a dataframe with timeseries data of crime with a facet on offence (which looks like the format below). What I'd like to perform a groupby plot on the dataframe so that it's possible to explore trends in crime over time.
我有一个包含犯罪时间序列数据的数据框,其中包含进攻方面的数据(类似于下面的格式)。我想在数据框上执行分组图,以便可以探索犯罪随时间的变化趋势。
Offence Rolling year total number of offences Month
0 Criminal damage and arson 1001 2003-03-31
1 Drug offences 66 2003-03-31
2 All other theft offences 617 2003-03-31
3 Bicycle theft 92 2003-03-31
4 Domestic burglary 282 2003-03-31
I've got some code which does the job, but it's a bit clumsy and it loses the time series formatting that Pandas delivers on a single plot. (I've included an image to illustrate). Can anyone suggest an idiom for such plots that I can use?
我有一些代码可以完成这项工作,但它有点笨拙,并且失去了 Pandas 在单个绘图上提供的时间序列格式。(我已经包含了一张图片来说明)。任何人都可以为我可以使用的这些情节建议一个成语吗?
I would turn to Seaborn but I can't work out how to format the xlabel as timeseries.
我会求助于 Seaborn,但我不知道如何将 xlabel 格式化为时间序列。
[![subs = \[\]
for idx, (i, g) in enumerate(df.groupby("Offence")):
subs.append({"data": g.set_index("Month").resample("QS-APR", how="sum" ).ix\["2010":\],
"title":i})
ax = plt.figure(figsize=(25,15))
for i,g in enumerate(subs):
plt.subplot(5, 5, i)
plt.plot(g\['data'\])
plt.title(g\['title'\])
plt.xlabel("Time")
plt.ylabel("No. of crimes")
plt.tight_layout()][1]][1]
回答by Sergey Bushmanov
This is a reproducible example of 6 scatterplots in Pandas, obtained from pd.groupby()
for 6 consecutive years. On x axis -- there is oil price (brent) for the year, on y -- the value for sp500 for the same year.
这是 Pandas 中pd.groupby()
连续 6 年获得的 6 个散点图的可重现示例。在 x 轴上 - 有当年的石油价格(布伦特),在 y 上 - 同年 sp500 的值。
import matplotlib.pyplot as plt
import pandas as pd
import Quandl as ql
%matplotlib inline
brent = ql.get('FRED/DCOILBRENTEU')
sp500 = ql.get('YAHOO/INDEX_GSPC')
values = pd.DataFrame({'brent':brent.VALUE, 'sp500':sp500.Close}).dropna()["2009":"2015"]
fig, axes = plt.subplots(2,3, figsize=(15,5))
for (year, group), ax in zip(values.groupby(values.index.year), axes.flatten()):
group.plot(x='brent', y='sp500', kind='scatter', ax=ax, title=year)
This produces the below plot:
这会产生以下图:
(Just in case, from these plots you may infer there was a strong correlation between oil and sp500 in 2010 but not in other years).
(以防万一,从这些图中您可以推断出石油和 sp500 之间在 2010 年有很强的相关性,但在其他年份则不然)。
You may change kind
in group.plot()
so that it suits your specific kind or data. My anticipation, pandas will preserve the date formatting for x-axis if you have it in your data.
您可以更改kind
以group.plot()
使其适合您的特定类型或数据。我的预期是,如果您的数据中有 x 轴的日期格式,pandas 将保留它。
回答by Nipun Batra
Altaircan work great in such cases.
Altair在这种情况下可以很好地工作。
import matplotlib.pyplot as plt
import pandas as pd
import quandl as ql
df = ql.get(["NSE/OIL.1", "WIKI/AAPL.1"], start_date="2013-1-1")
df.columns = ['OIL', 'AAPL']
df['year'] = df.index.year
from altair import *
Viz #1- No color by year/No columns by year
可视化 #1- 按年份没有颜色/按年份没有列
Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL').configure_cell(width=200, height=150)
Viz #2- No color by year/columns by year
可视化 #2- 没有按年份/按年份列的颜色
Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL', column='year').configure_cell(width=140, height=70).configure_facet_cell(strokeWidth=0)
Viz #3- Color by year
可视化 #3- 按年份着色
Chart(df).mark_point(size=1).encode(x='AAPL',y='OIL', color='year:N').configure_cell(width=140, height=70)