Python 多个groupby后如何将pandas数据从索引移动到列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21767900/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to move pandas data from index to column after multiple groupby
提问by prooffreader
I have the following pandas dataframe:
我有以下熊猫数据框:
dfalph.head()
token year uses books
386 xanthos 1830 3 3
387 xanthos 1840 1 1
388 xanthos 1840 2 2
389 xanthos 1868 2 2
390 xanthos 1875 1 1
I aggregate the rows with duplicate tokenand yearslike so:
我用聚合重复行token和years像这样:
dfalph = dfalph[['token','year','uses','books']].groupby(['token', 'year']).agg([np.sum])
dfalph.columns = dfalph.columns.droplevel(1)
dfalph.head()
uses books
token year
xanthos 1830 3 3
1840 3 3
1867 2 2
1868 2 2
1875 1 1
Instead of having the 'token' and 'year' fields in the index, I would like to return them to columns and have an integer index.
我想将它们返回到列并具有整数索引,而不是在索引中包含“令牌”和“年份”字段。
采纳答案by DSM
Method #1: reset_index()
方法#1:reset_index()
>>> g
uses books
sum sum
token year
xanthos 1830 3 3
1840 3 3
1868 2 2
1875 1 1
[4 rows x 2 columns]
>>> g = g.reset_index()
>>> g
token year uses books
sum sum
0 xanthos 1830 3 3
1 xanthos 1840 3 3
2 xanthos 1868 2 2
3 xanthos 1875 1 1
[4 rows x 4 columns]
Method #2: don't make the index in the first place, using as_index=False
方法#2:首先不要创建索引,使用as_index=False
>>> g = dfalph[['token', 'year', 'uses', 'books']].groupby(['token', 'year'], as_index=False).sum()
>>> g
token year uses books
0 xanthos 1830 3 3
1 xanthos 1840 3 3
2 xanthos 1868 2 2
3 xanthos 1875 1 1
[4 rows x 4 columns]
回答by Adarsh Madrecha
I defer form the accepted answer.
While there are 2 ways to do this, these will not necessarily result in same output. Specially when you are using Grouperin groupby
我推迟了接受的答案。虽然有两种方法可以做到这一点,但这些方法不一定会产生相同的输出。特别是当您使用Grouper在groupby
index=Falsereset_index()
index=Falsereset_index()
example df
例子 df
+---------+---------+-------------+------------+
| column1 | column2 | column_date | column_sum |
+---------+---------+-------------+------------+
| A | M | 26-10-2018 | 2 |
| B | M | 28-10-2018 | 3 |
| A | M | 30-10-2018 | 6 |
| B | M | 01-11-2018 | 3 |
| C | N | 03-11-2018 | 4 |
+---------+---------+-------------+------------+
They do not work the same way.
它们的工作方式不同。
df = df.groupby(
by=[
'column1',
'column2',
pd.Grouper(key='column_date', freq='M')
],
as_index=False
).sum()
The above will give
以上会给
+---------+---------+------------+
| column1 | column2 | column_sum |
+---------+---------+------------+
| A | M | 8 |
| B | M | 3 |
| B | M | 3 |
| C | N | 4 |
+---------+---------+------------+
While,
尽管,
df = df.groupby(
by=[
'column1',
'column2',
pd.Grouper(key='column_date', freq='M')
]
).sum().reset_index()
Will give
会给
+---------+---------+-------------+------------+
| column1 | column2 | column_date | column_sum |
+---------+---------+-------------+------------+
| A | M | 31-10-2018 | 8 |
| B | M | 31-10-2018 | 3 |
| B | M | 30-11-2018 | 3 |
| C | N | 30-11-2018 | 4 |
+---------+---------+-------------+------------+
回答by user1809802
You need to add drop=True:
您需要添加drop=True:
df.reset_index(drop=True)
df = df.groupby(
by=[
'column1',
'column2',
pd.Grouper(key='column_date', freq='M')
]
).sum().reset_index(drop=True)

