Pandas 数据帧中的多索引分组依据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/22214985/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MultiIndex Group By in Pandas Data Frame
提问by metersk
I have a data set that contains countries and statistics on economic indicators by year, organized like so:
我有一个数据集,其中包含按年份划分的国家和经济指标统计数据,其组织方式如下:
Country  Metric           2011   2012   2013  2014
  USA     GDP               7      4     0      2
  USA     Pop.              2      3     0      3
  GB      GDP               8      7     0      7
  GB      Pop.              2      6     0      0
  FR      GDP               5      0     0      1
  FR      Pop.              1      1     0      5
How can I use MultiIndex in pandas to create a data frame that only shows GDP by Year for each country?
如何在 Pandas 中使用 MultiIndex 创建一个仅按年份显示每个国家/地区的 GDP 的数据框?
I tried:
我试过:
df = data.groupby(['Country', 'Metric'])
but it didn't work properly.
但它没有正常工作。
回答by Paul H
In this case, you don't actually need a groupby. You also don't have a MultiIndex. You can make one like this:
在这种情况下,您实际上并不需要groupby. 你也没有MultiIndex. 你可以做一个这样的:
import pandas
from io import StringIO
datastring = StringIO("""\
Country  Metric           2011   2012   2013  2014
USA     GDP               7      4     0      2
USA     Pop.              2      3     0      3
GB      GDP               8      7     0      7
GB      Pop.              2      6     0      0
FR      GDP               5      0     0      1
FR      Pop.              1      1     0      5
""")
data = pandas.read_table(datastring, sep='\s\s+')
data.set_index(['Country', 'Metric'], inplace=True)
Then datalooks like this:
然后data看起来像这样:
                2011  2012  2013  2014
Country Metric                        
USA     GDP        7     4     0     2
        Pop.       2     3     0     3
GB      GDP        8     7     0     7
        Pop.       2     6     0     0
FR      GDP        5     0     0     1
        Pop.       1     1     0     5
Now to get the GDPs, you can take a cross-section of the dataframe via the xsmethod:
现在要获得 GDP,您可以通过以下xs方法获取数据框的横截面:
data.xs('GDP', level='Metric')
         2011  2012  2013  2014
Country                        
USA         7     4     0     2
GB          8     7     0     7
FR          5     0     0     1
It's so easy because your data are already pivoted/unstacked. IF they weren't and looked like this:
这非常简单,因为您的数据已经被旋转/未堆叠。如果它们不是并且看起来像这样:
data.columns.names = ['Year']
data = data.stack()
data
Country  Metric  Year
USA      GDP     2011    7
                 2012    4
                 2013    0
                 2014    2
         Pop.    2011    2
                 2012    3
                 2013    0
                 2014    3
GB       GDP     2011    8
                 2012    7
                 2013    0
                 2014    7
         Pop.    2011    2
                 2012    6
                 2013    0
                 2014    0
FR       GDP     2011    5
                 2012    0
                 2013    0
                 2014    1
         Pop.    2011    1
                 2012    1
                 2013    0
                 2014    5
You could then use groupbyto tell you something about the world as a whole:
然后你可以groupby用来告诉你关于整个世界的一些事情:
data.groupby(level=['Metric', 'Year']).sum()
Metric  Year
GDP     2011    20
        2012    11
        2013     0
        2014    10
Pop.    2011     5
        2012    10
        2013     0
        2014     8
Or get real fancy:
或者得到真正的幻想:
data.groupby(level=['Metric', 'Year']).sum().unstack(level='Metric')
Metric  GDP  Pop.
Year             
2011     20     5
2012     11    10
2013      0     0
2014     10     8
回答by Amit Verma
Is this what you are looking for:
这是你想要的:
df = df.groupby(['Metric'])
df.get_group('GDP')
   Country Metric  2011    2012    2013    2014
0    USA     GDP     7      4       0       2
2    GB      GDP     8      7       0       7
4    FR      GDP     5      0       0       1

