Pandas Groupby：计数和平均值相结合

Question

提问by Lewis Anderson

Working with PANDAS to try and summarise a dataframe as a count of certain categories, as well as the means sentiment score for these categories.

与 PANDAS 一起尝试将数据框总结为某些类别的计数，以及这些类别的平均情感得分。

There is table full of strings which have different sentiment scores, and I want to group each text source by saying how many posts they have, as well as the average sentiment of these posts.

有一个满是具有不同情绪分数的字符串的表格，我想通过说明每个文本源有多少帖子以及这些帖子的平均情绪来对每个文本源进行分组。

My (simplified) dataframe looks like this:

我的（简化的）数据框如下所示：

source    text              sent
--------------------------------
bar       some string       0.13
foo       alt string        -0.8
bar       another str       0.7
foo       some text         -0.2
foo       more text         -0.5

The output from this should be something like this:

输出应该是这样的：

source    count     mean_sent
-----------------------------
foo       3         -0.5
bar       2         0.415

The answer is somewhere along the lines of:

答案大致如下：

df['sent'].groupby(df['source']).mean()

Yet only gives each source and it's mean, with no column headers.

然而，只给出每个来源，它的意思，没有列标题。

Thanks in advance!

提前致谢！

Answer 1

回答by jezrael

You can use groupbywith aggregate:

你可以用groupby与aggregate：

df = df.groupby('source') \
       .agg({'text':'size', 'sent':'mean'}) \
       .rename(columns={'text':'count','sent':'mean_sent'}) \
       .reset_index()
print (df)
  source  count  mean_sent
0    bar      2      0.415
1    foo      3     -0.500

Answer 2

回答by neves

In newer versions of Panda you don't need the rename anymore, just use named parameters:

在较新版本的 Panda 中，您不再需要重命名，只需使用命名参数：

df = df.groupby('source') \
       .agg(count=('text', 'size'), mean_sent=('sent', 'mean')) \
       .reset_index()

print (df)
  source  count  mean_sent
0    bar      2      0.415
1    foo      3     -0.500

Answer 3

回答by Ojha

Below one should work fine:

下面一个应该可以正常工作：

df[['source','sent']].groupby('source').agg(['count','mean'])

Answer 4

回答by galitbw

I think this should provide the output that you wanted:

我认为这应该提供您想要的输出：

result = pd.DataFrame(df.groupby('source').size())

results['mean_score'] = df.groupby('source').sent.mean()

Pandas Groupby：计数和平均值相结合

提问by Lewis Anderson

回答by jezrael

回答by neves

回答by Ojha

回答by galitbw

相关推荐

最近更新

标签

Pandas Groupby：计数和平均值相结合

提问by Lewis Anderson

回答by jezrael

回答by neves

回答by Ojha

回答by galitbw

相关推荐

将 json 转换为 Pandas DataFrame

带有 WHERE 子句的 JOIN 的 Pandas 模拟

pandas 如何删除熊猫数据框中的行？

pandas AttributeError: 'DataFrame' 对象没有属性 'Address'

相关推荐

最近更新

标签