pandas 将一个 DataFrame 分组到一个新的 DataFrame 中,并以 arange 作为索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47897607/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:55:31  来源:igfitidea点击:

Groupby a DataFrame into a new DataFrame with arange as index

pythonpandaspandas-groupby

提问by Bas

I have a question, simplified in this example. Consider this Pandas DataFrame, df_a:

我有一个问题,在这个例子中简化了。考虑这个 Pandas DataFrame,df_a:

df_a=pd.DataFrame([['1001',34.3,'red'],['1001',900.04,'red'],['1001',776,'red'],['1003',18.95,'green'],['1004',321.2,'blue']],columns=['id','amount','name'])

    id      amount  name
0   1001    34.30   red
1   1001    900.04  red
2   1001    776.00  red
3   1003    18.95   green
4   1004    321.20  blue

I would like to groupby this dataframe by summing the amount into a new DataFrame and create a new 'arange'-like index. This should be the result I would like to have:

我想通过将数量相加到一个新的 DataFrame 并创建一个新的类似“arange”的索引来对这个数据帧进行分组。这应该是我想要的结果:

    id      amount
0   1001    1710.34
1   1003    18.95
2   1004    321.20

But my efforts create a Series (I would like a DataFrame as result):

但是我的努力创建了一个系列(我想要一个 DataFrame 作为结果):

df_a.groupby(['id'])['amount'].sum()

id
1001    1710.34
1003      18.95
1004     321.20
Name: amount, dtype: float64

or create a new index based on the id column:

或者根据 id 列创建一个新索引:

pd.DataFrame(df_a.groupby(['id'])['amount'].sum())

        amount
id  
1001    1710.34
1003    18.95
1004    321.20

I've also tried to pass the index parameter, but that doesn't work either:

我也试过传递 index 参数,但这也不起作用:

pd.DataFrame(df_a.groupby(['id'])['amount'].sum(),index=df_a.index.values)

   amount
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN

Does anyone have an elegant solution for this ?

有没有人对此有一个优雅的解决方案?

采纳答案by Vaishali

You have a parameter as_index in groupby for that

您在 groupby 中有一个参数 as_index

df_a.groupby('id', as_index = False)['amount'].sum()

You get

你得到

    id  amount
0   1001    1710.34
1   1003    18.95
2   1004    321.20

回答by student

You can try the following by adding to_frame()and reset_index():

您可以通过添加to_frame()和来尝试以下操作reset_index()

new_df = df_a.groupby(['id'])['amount'].sum().to_frame('amount').reset_index()
print(new_df)

Result:

结果:

     id   amount
0  1001  1710.34
1  1003    18.95
2  1004   321.20

If you only use to_frame()i.e. using

如果你只使用to_frame()ie 使用

df_a.groupby(['id'])['amount'].sum().to_frame('amount')

df_a.groupby(['id'])['amount'].sum().to_frame('amount')

it will keep index on idas following:

它将保持索引id如下:

      amount
id           
1001  1710.34
1003    18.95
1004   321.20

Other way is to reset index on dataframe in your above code:

另一种方法是在上面的代码中重置数据帧上的索引:

new_df = pd.DataFrame(df_a.groupby(['id'])['amount'].sum()).reset_index()

new_df = pd.DataFrame(df_a.groupby(['id'])['amount'].sum()).reset_index()

Output would be same as above:

输出将与上面相同:

     id   amount
0  1001  1710.34
1  1003    18.95
2  1004   321.20