pandas 在pandas df中查找timedelta对象的均值和标准差

Question

提问by Graham Streich

I would like to calculate the meanand standard deviationof a timedeltaby bank from a dataframewith two columns shown below. When I run the code (also shown below) I get the below error:

我想计算mean和standard deviation的timedelta银行从dataframe下面显示两列。当我运行代码（也如下所示）时，出现以下错误：

pandas.core.base.DataError: No numeric types to aggregate

My dataframe:

我的数据框：

   bank                          diff
   Bank of Japan                 0 days 00:00:57.416000
   Reserve Bank of Australia     0 days 00:00:21.452000
   Reserve Bank of New Zealand  55 days 12:39:32.269000
   U.S. Federal Reserve          8 days 13:27:11.387000

My code:

我的代码：

means = dropped.groupby('bank').mean()
std = dropped.groupby('bank').std()

Answer 1

采纳答案by jezrael

You need to convert timedeltato some numeric value, e.g. int64by valueswhat is most accurate, because convert to nsis what is the numeric representation of timedelta:

需要转换timedelta到一些数值，比如int64由values什么是最准确的，因为皈依ns是什么，是的数值表示timedelta：

dropped['new'] = dropped['diff'].values.astype(np.int64)

means = dropped.groupby('bank').mean()
means['new'] = pd.to_timedelta(means['new'])

std = dropped.groupby('bank').std()
std['new'] = pd.to_timedelta(std['new'])

Another solution is to convert values to secondsby total_seconds, but that is less accurate:

另一种解决方案是将值转换为secondsby total_seconds，但这不太准确：

dropped['new'] = dropped['diff'].dt.total_seconds()

means = dropped.groupby('bank').mean()

Answer 2

回答by Wesam

No need to convert timedeltaback and forth. Numpy and pandas can seamlessly do it for you with a faster run time. Using your droppedDataFrame:

无需timedelta来回转换。Numpy 和 Pandas 可以以更快的运行时间无缝地为您完成。使用您的droppedDataFrame：

import numpy as np

grouped = dropped.groupby('bank')['diff']

mean = grouped.apply(lambda x: np.mean(x))
std = grouped.apply(lambda x: np.std(x))

Answer 3

回答by Alexander Usikov

Pandas mean()and other aggregation methods support numeric_only=Falseparameter.

Pandasmean()和其他聚合方法支持numeric_only=False参数。

dropped.groupby('bank').mean(numeric_only=False)

Found here: Aggregations for Timedelta values in the Python DataFrame

在这里找到：Python DataFrame 中 Timedelta 值的聚合

Answer 4

回答by Cor

I would suggest passing the numeric_only=Falseargument to meanas mentioned by Alexander Usikov - this works for pandas version 0.20+.

我建议将numeric_only=False参数传递给meanAlexander Usikov 提到的 - 这适用于 0.20+ 版的Pandas。

If you have an older version, the following works:

如果您有旧版本，则以下操作有效：

import pandas pd

df = pd.DataFrame({
    'td': pd.Series([pd.Timedelta(days=i) for i in range(5)]),
    'group': ['a', 'a', 'a', 'b', 'b']
})

(
    df
    .astype({'td': int})         # convert timedelta to integer (nanoseconds)
    .groupby('group')
    .mean()
    .astype({'td': 'timedelta64[ns]'})
)

pandas 在pandas df中查找timedelta对象的均值和标准差

提问by Graham Streich

采纳答案by jezrael

回答by Wesam

回答by Alexander Usikov

回答by Cor

相关推荐

最近更新

标签

pandas 在pandas df中查找timedelta对象的均值和标准差

提问by Graham Streich

采纳答案by jezrael

回答by Wesam

回答by Alexander Usikov

回答by Cor

相关推荐

pandas 在熊猫数据框中插入值

Pandas 数据框 to_csv - 拆分为多个输出文件

Pandas - 添加列，匹配索引

pandas 如何在散点图顶部绘制附加点？

相关推荐

最近更新

标签