Pandas:按日期和另一个变量的中位数分组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34680713/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:28:43  来源:igfitidea点击:

Pandas: Group by date and the median of another variable

pythonpandas

提问by RDJ

This is a demo example of my DataFrame. The full DataFrame has multiple additional variables and covers 6 months of data.

这是我的 DataFrame 的演示示例。完整的 DataFrame 有多个附加变量,涵盖 6 个月的数据。

sentiment     date
1             2015-05-26 18:58:44
0.9           2015-05-26 19:57:31
0.7           2015-05-26 18:58:24
0.4           2015-05-27 19:17:34
0.6           2015-05-27 18:46:12
0.5           2015-05-27 13:32:24
1             2015-05-28 19:27:31
0.7           2015-05-28 18:58:44
0.2           2015-05-28 19:47:34

I want to group the DataFrame by just the day of the datecolumn, but at the same time aggregate the median of the sentimentcolumn.

我想仅按date列的日期对 DataFrame 进行分组,但同时聚合sentiment列的中位数。

Everything I have tried with groupby, the dtaccessor and timegrouperhas failed.

我尝试过groupby的所有dt访问器timegrouper都失败了。

I want to return a pandas DataFrame not a GroupBy object.

我想返回一个 Pandas DataFrame 而不是 GroupBy 对象。

The date column is M8[ns]

日期列是 M8[ns]

The sentiment column float64

情感专栏 float64

回答by chrisaycock

You fortunately have the tools you need listed in your question.

幸运的是,您的问题中列出了您需要的工具。

In [61]: df.groupby(df.date.dt.date)[['sentiment']].median()
Out[61]:
            sentiment
2015-05-26        0.9
2015-05-27        0.5
2015-05-28        0.7

回答by Joseph Yourine

I would do this :

我会这样做:

df['date'] = df['date'].apply(lambda x : x.date())
df = df.groupby('date').agg({'sentiment':np.median}).reset_index()

You first replace the datetime column with the date. Then you perform the groupby+agg operation.

您首先用日期替换日期时间列。然后执行 groupby+agg 操作。

回答by Rahul Mehta

You can get any number of metrics using one group by and .agg() function
1) create new column extracting date.
2) Use groupy by and apply numpy.median,numpy.mean etc

您可以使用一个 group by 和 .agg() 函数来获取任意数量的指标
1) 创建新的列提取日期。
2) 使用 groupy by 并应用 numpy.median,numpy.mean 等

import pandas as pd
x = [[1,'2015-05-26 18:58:44'],
     [0.9,'2015-05-26 19:57:31']]
t = pd.DataFrame(x,columns = ['a','b'])
t.b = pd.to_datetime(t['b'])
t['datex'] = t['b'].dt.date


t.groupby(['datex']).agg({
    'a': np.median
})

Output -

输出 -

datex   
2015-05-26  0.95

回答by Roman Orac

I would do this, because you can do multiple aggregations (like median, mean, min, max, etc.) on multiple columns at the same time:

我会这样做,因为您可以同时在多个列上进行多个聚合(如中值、平均值、最小值、最大值等):

df.groupby(df.date.dt.date).agg({'sentiment': ['median']})