Pandas Groupby:计数和平均值相结合
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41040132/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Groupby: Count and mean combined
提问by Lewis Anderson
Working with PANDAS to try and summarise a dataframe as a count of certain categories, as well as the means sentiment score for these categories.
与 PANDAS 一起尝试将数据框总结为某些类别的计数,以及这些类别的平均情感得分。
There is table full of strings which have different sentiment scores, and I want to group each text source by saying how many posts they have, as well as the average sentiment of these posts.
有一个满是具有不同情绪分数的字符串的表格,我想通过说明每个文本源有多少帖子以及这些帖子的平均情绪来对每个文本源进行分组。
My (simplified) dataframe looks like this:
我的(简化的)数据框如下所示:
source text sent
--------------------------------
bar some string 0.13
foo alt string -0.8
bar another str 0.7
foo some text -0.2
foo more text -0.5
The output from this should be something like this:
输出应该是这样的:
source count mean_sent
-----------------------------
foo 3 -0.5
bar 2 0.415
The answer is somewhere along the lines of:
答案大致如下:
df['sent'].groupby(df['source']).mean()
Yet only gives each source and it's mean, with no column headers.
然而,只给出每个来源,它的意思,没有列标题。
Thanks in advance!
提前致谢!
回答by jezrael
回答by neves
In newer versions of Panda you don't need the rename anymore, just use named parameters:
在较新版本的 Panda 中,您不再需要重命名,只需使用命名参数:
df = df.groupby('source') \
.agg(count=('text', 'size'), mean_sent=('sent', 'mean')) \
.reset_index()
print (df)
source count mean_sent
0 bar 2 0.415
1 foo 3 -0.500
回答by Ojha
Below one should work fine:
下面一个应该可以正常工作:
df[['source','sent']].groupby('source').agg(['count','mean'])
df[['source','sent']].groupby('source').agg(['count','mean'])
回答by galitbw
I think this should provide the output that you wanted:
我认为这应该提供您想要的输出:
result = pd.DataFrame(df.groupby('source').size())
result = pd.DataFrame(df.groupby('source').size())
results['mean_score'] = df.groupby('source').sent.mean()
results['mean_score'] = df.groupby('source').sent.mean()