pandas 按两列(或更多)对熊猫数据框进行分组?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21247992/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
grouping pandas dataframe by two columns (or more)?
提问by waitingkuo
I have the following dataframe:
我有以下数据框:
mydf = pandas.DataFrame({"cat": ["first", "first", "first", "second", "second", "third"], "class": ["A", "A", "A", "B", "B", "C"], "name": ["a1", "a2", "a3", "b1", "b2", "c1"], "val": [1,5,1,1,2,10]})
I want to create a dataframe that makes summary statistics about the valcolumn of items with the same classid. For this I use groupbyas follows:
我想创建一个数据框,对val具有相同classid的项目列进行汇总统计。为此,我使用groupby如下:
mydf.groupby("class").val.sum()
that's the correct behavior, but I'd like to retain the catcolumn information in the resulting df. can that be done? do I have to merge/jointhat info in later? I tried:
这是正确的行为,但我想cat在生成的 df 中保留列信息。可以做到吗?我以后必须要merge/join这些信息吗?我试过:
mydf.groupby(["cat", "class"]).val.sum()
but this uses hierarchical indexing. I'd like to have a plain dataframe back that just has the catvalue for each group, where the group by is class. The output should be a dataframe (not series) with the values of cat and class, where the valentries are summed over each entry that has the same class:
但这使用分层索引。我想要一个简单的数据框,它只包含cat每个组的值,其中 group by 是class. 输出应该是具有 cat 和 class 值的数据帧(不是系列),其中val条目在每个具有相同的条目上求和class:
cat class val
first A 7
second B 3
third C 10
is this possible?
这可能吗?
回答by waitingkuo
Use reset_index
用 reset_index
In [9]: mydf.groupby(['cat', "class"]).val.sum().reset_index()
Out[9]:
cat class val
0 first A 7
1 second B 3
2 third C 10
EDIT
编辑
set level=1 if you want to set catas index
如果要设置cat为索引,请设置 level=1
In [10]: mydf.groupby(['cat', "class"]).val.sum().reset_index(level=1)
Out[10]:
class val
cat
first A 7
second B 3
third C 10
You can also set as_index=Falseto get the same output
您也可以设置as_index=False以获得相同的输出
In [29]: mydf.groupby(['cat', "class"], as_index=False).val.sum()
Out[29]:
cat class val
0 first A 7
1 second B 3
2 third C 10

