Pandas：按日期和另一个变量的中位数分组

Question

提问by RDJ

This is a demo example of my DataFrame. The full DataFrame has multiple additional variables and covers 6 months of data.

这是我的 DataFrame 的演示示例。完整的 DataFrame 有多个附加变量，涵盖 6 个月的数据。

sentiment     date
1             2015-05-26 18:58:44
0.9           2015-05-26 19:57:31
0.7           2015-05-26 18:58:24
0.4           2015-05-27 19:17:34
0.6           2015-05-27 18:46:12
0.5           2015-05-27 13:32:24
1             2015-05-28 19:27:31
0.7           2015-05-28 18:58:44
0.2           2015-05-28 19:47:34

I want to group the DataFrame by just the day of the datecolumn, but at the same time aggregate the median of the sentimentcolumn.

我想仅按date列的日期对 DataFrame 进行分组，但同时聚合sentiment列的中位数。

Everything I have tried with groupby, the dtaccessor and timegrouperhas failed.

我尝试过groupby的所有dt访问器timegrouper都失败了。

I want to return a pandas DataFrame not a GroupBy object.

我想返回一个 Pandas DataFrame 而不是 GroupBy 对象。

The date column is M8[ns]

日期列是 M8[ns]

The sentiment column float64

情感专栏 float64

Answer 1

回答by chrisaycock

You fortunately have the tools you need listed in your question.

幸运的是，您的问题中列出了您需要的工具。

In [61]: df.groupby(df.date.dt.date)[['sentiment']].median()
Out[61]:
            sentiment
2015-05-26        0.9
2015-05-27        0.5
2015-05-28        0.7

Answer 2

回答by Joseph Yourine

I would do this :

我会这样做：

df['date'] = df['date'].apply(lambda x : x.date())
df = df.groupby('date').agg({'sentiment':np.median}).reset_index()

You first replace the datetime column with the date. Then you perform the groupby+agg operation.

您首先用日期替换日期时间列。然后执行 groupby+agg 操作。

Answer 3

回答by Rahul Mehta

You can get any number of metrics using one group by and .agg() function
1) create new column extracting date.
2) Use groupy by and apply numpy.median,numpy.mean etc

您可以使用一个 group by 和 .agg() 函数来获取任意数量的指标
1) 创建新的列提取日期。
2) 使用 groupy by 并应用 numpy.median,numpy.mean 等

import pandas as pd
x = [[1,'2015-05-26 18:58:44'],
     [0.9,'2015-05-26 19:57:31']]
t = pd.DataFrame(x,columns = ['a','b'])
t.b = pd.to_datetime(t['b'])
t['datex'] = t['b'].dt.date


t.groupby(['datex']).agg({
    'a': np.median
})

Output -

输出 -

datex   
2015-05-26  0.95

Answer 4

回答by Roman Orac

I would do this, because you can do multiple aggregations (like median, mean, min, max, etc.) on multiple columns at the same time:

我会这样做，因为您可以同时在多个列上进行多个聚合（如中值、平均值、最小值、最大值等）：

df.groupby(df.date.dt.date).agg({'sentiment': ['median']})

Pandas：按日期和另一个变量的中位数分组

提问by RDJ

回答by chrisaycock

回答by Joseph Yourine

回答by Rahul Mehta

回答by Roman Orac

相关推荐

最近更新

标签

Pandas：按日期和另一个变量的中位数分组

提问by RDJ

回答by chrisaycock

回答by Joseph Yourine

回答by Rahul Mehta

回答by Roman Orac

相关推荐

如何将 Pandas Dataframe 写入 Django 模型

透视包含字符串的 Pandas 数据框 - “没有可聚合的数字类型”错误

pandas StringIO 和熊猫 read_csv

pandas Scikit 学习/熊猫中的线性回归和梯度下降？

相关推荐

最近更新

标签