Pandas Groupby:计数和平均值相结合

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41040132/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:35:29  来源:igfitidea点击:

Pandas Groupby: Count and mean combined

pythonpython-2.7pandasdataframegroup-by

提问by Lewis Anderson

Working with PANDAS to try and summarise a dataframe as a count of certain categories, as well as the means sentiment score for these categories.

与 PANDAS 一起尝试将数据框总结为某些类别的计数,以及这些类别的平均情感得分。

There is table full of strings which have different sentiment scores, and I want to group each text source by saying how many posts they have, as well as the average sentiment of these posts.

有一个满是具有不同情绪分数的字符串的表格,我想通过说明每个文本源有多少帖子以及这些帖子的平均情绪来对每个文本源进行分组。

My (simplified) dataframe looks like this:

我的(简化的)数据框如下所示:

source    text              sent
--------------------------------
bar       some string       0.13
foo       alt string        -0.8
bar       another str       0.7
foo       some text         -0.2
foo       more text         -0.5

The output from this should be something like this:

输出应该是这样的:

source    count     mean_sent
-----------------------------
foo       3         -0.5
bar       2         0.415

The answer is somewhere along the lines of:

答案大致如下:

df['sent'].groupby(df['source']).mean()

Yet only gives each source and it's mean, with no column headers.

然而,只给出每个来源,它的意思,没有列标题。

Thanks in advance!

提前致谢!

回答by jezrael

You can use groupbywith aggregate:

你可以用groupbyaggregate

df = df.groupby('source') \
       .agg({'text':'size', 'sent':'mean'}) \
       .rename(columns={'text':'count','sent':'mean_sent'}) \
       .reset_index()
print (df)
  source  count  mean_sent
0    bar      2      0.415
1    foo      3     -0.500

回答by neves

In newer versions of Panda you don't need the rename anymore, just use named parameters:

在较新版本的 Panda 中,您不再需要重命名,只需使用命名参数:

df = df.groupby('source') \
       .agg(count=('text', 'size'), mean_sent=('sent', 'mean')) \
       .reset_index()

print (df)
  source  count  mean_sent
0    bar      2      0.415
1    foo      3     -0.500

回答by Ojha

Below one should work fine:

下面一个应该可以正常工作:

df[['source','sent']].groupby('source').agg(['count','mean'])

df[['source','sent']].groupby('source').agg(['count','mean'])

回答by galitbw

I think this should provide the output that you wanted:

我认为这应该提供您想要的输出:

result = pd.DataFrame(df.groupby('source').size())

result = pd.DataFrame(df.groupby('source').size())

results['mean_score'] = df.groupby('source').sent.mean()

results['mean_score'] = df.groupby('source').sent.mean()