pandas 在pandas df中查找timedelta对象的均值和标准差
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44616546/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Finding the mean and standard deviation of a timedelta object in pandas df
提问by Graham Streich
I would like to calculate the meanand standard deviationof a timedeltaby bank from a dataframewith two columns shown below. When I run the code (also shown below) I get the below error:
我想计算mean和standard deviation的timedelta银行从dataframe下面显示两列。当我运行代码(也如下所示)时,出现以下错误:
pandas.core.base.DataError: No numeric types to aggregate
My dataframe:
我的数据框:
bank diff
Bank of Japan 0 days 00:00:57.416000
Reserve Bank of Australia 0 days 00:00:21.452000
Reserve Bank of New Zealand 55 days 12:39:32.269000
U.S. Federal Reserve 8 days 13:27:11.387000
My code:
我的代码:
means = dropped.groupby('bank').mean()
std = dropped.groupby('bank').std()
采纳答案by jezrael
You need to convert timedeltato some numeric value, e.g. int64by valueswhat is most accurate, because convert to nsis what is the numeric representation of timedelta:
需要转换timedelta到一些数值,比如int64由values什么是最准确的,因为皈依ns是什么,是的数值表示timedelta:
dropped['new'] = dropped['diff'].values.astype(np.int64)
means = dropped.groupby('bank').mean()
means['new'] = pd.to_timedelta(means['new'])
std = dropped.groupby('bank').std()
std['new'] = pd.to_timedelta(std['new'])
Another solution is to convert values to secondsby total_seconds, but that is less accurate:
另一种解决方案是将值转换为secondsby total_seconds,但这不太准确:
dropped['new'] = dropped['diff'].dt.total_seconds()
means = dropped.groupby('bank').mean()
回答by Wesam
No need to convert timedeltaback and forth. Numpy and pandas can seamlessly do it for you with a faster run time. Using your droppedDataFrame:
无需timedelta来回转换。Numpy 和 Pandas 可以以更快的运行时间无缝地为您完成。使用您的droppedDataFrame:
import numpy as np
grouped = dropped.groupby('bank')['diff']
mean = grouped.apply(lambda x: np.mean(x))
std = grouped.apply(lambda x: np.std(x))
回答by Alexander Usikov
Pandas mean()and other aggregation methods support numeric_only=Falseparameter.
Pandasmean()和其他聚合方法支持numeric_only=False参数。
dropped.groupby('bank').mean(numeric_only=False)
Found here: Aggregations for Timedelta values in the Python DataFrame
回答by Cor
I would suggest passing the numeric_only=Falseargument to meanas mentioned by Alexander Usikov - this works for pandas version 0.20+.
我建议将numeric_only=False参数传递给meanAlexander Usikov 提到的 - 这适用于 0.20+ 版的Pandas。
If you have an older version, the following works:
如果您有旧版本,则以下操作有效:
import pandas pd
df = pd.DataFrame({
'td': pd.Series([pd.Timedelta(days=i) for i in range(5)]),
'group': ['a', 'a', 'a', 'b', 'b']
})
(
df
.astype({'td': int}) # convert timedelta to integer (nanoseconds)
.groupby('group')
.mean()
.astype({'td': 'timedelta64[ns]'})
)

