pandas 将一个 DataFrame 分组到一个新的 DataFrame 中,并以 arange 作为索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47897607/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Groupby a DataFrame into a new DataFrame with arange as index
提问by Bas
I have a question, simplified in this example. Consider this Pandas DataFrame, df_a:
我有一个问题,在这个例子中简化了。考虑这个 Pandas DataFrame,df_a:
df_a=pd.DataFrame([['1001',34.3,'red'],['1001',900.04,'red'],['1001',776,'red'],['1003',18.95,'green'],['1004',321.2,'blue']],columns=['id','amount','name'])
id amount name
0 1001 34.30 red
1 1001 900.04 red
2 1001 776.00 red
3 1003 18.95 green
4 1004 321.20 blue
I would like to groupby this dataframe by summing the amount into a new DataFrame and create a new 'arange'-like index. This should be the result I would like to have:
我想通过将数量相加到一个新的 DataFrame 并创建一个新的类似“arange”的索引来对这个数据帧进行分组。这应该是我想要的结果:
id amount
0 1001 1710.34
1 1003 18.95
2 1004 321.20
But my efforts create a Series (I would like a DataFrame as result):
但是我的努力创建了一个系列(我想要一个 DataFrame 作为结果):
df_a.groupby(['id'])['amount'].sum()
id
1001 1710.34
1003 18.95
1004 321.20
Name: amount, dtype: float64
or create a new index based on the id column:
或者根据 id 列创建一个新索引:
pd.DataFrame(df_a.groupby(['id'])['amount'].sum())
amount
id
1001 1710.34
1003 18.95
1004 321.20
I've also tried to pass the index parameter, but that doesn't work either:
我也试过传递 index 参数,但这也不起作用:
pd.DataFrame(df_a.groupby(['id'])['amount'].sum(),index=df_a.index.values)
amount
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
Does anyone have an elegant solution for this ?
有没有人对此有一个优雅的解决方案?
采纳答案by Vaishali
You have a parameter as_index in groupby for that
您在 groupby 中有一个参数 as_index
df_a.groupby('id', as_index = False)['amount'].sum()
You get
你得到
id amount
0 1001 1710.34
1 1003 18.95
2 1004 321.20
回答by student
You can try the following by adding to_frame()
and reset_index()
:
您可以通过添加to_frame()
和来尝试以下操作reset_index()
:
new_df = df_a.groupby(['id'])['amount'].sum().to_frame('amount').reset_index()
print(new_df)
Result:
结果:
id amount
0 1001 1710.34
1 1003 18.95
2 1004 321.20
If you only use to_frame()
i.e. using
如果你只使用to_frame()
ie 使用
df_a.groupby(['id'])['amount'].sum().to_frame('amount')
df_a.groupby(['id'])['amount'].sum().to_frame('amount')
it will keep index on id
as following:
它将保持索引id
如下:
amount
id
1001 1710.34
1003 18.95
1004 321.20
Other way is to reset index on dataframe in your above code:
另一种方法是在上面的代码中重置数据帧上的索引:
new_df = pd.DataFrame(df_a.groupby(['id'])['amount'].sum()).reset_index()
new_df = pd.DataFrame(df_a.groupby(['id'])['amount'].sum()).reset_index()
Output would be same as above:
输出将与上面相同:
id amount
0 1001 1710.34
1 1003 18.95
2 1004 321.20