pandas 在pandas df中查找timedelta对象的均值和标准差
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44616546/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Finding the mean and standard deviation of a timedelta object in pandas df
提问by Graham Streich
I would like to calculate the mean
and standard deviation
of a timedelta
by bank from a dataframe
with two columns shown below. When I run the code (also shown below) I get the below error:
我想计算mean
和standard deviation
的timedelta
银行从dataframe
下面显示两列。当我运行代码(也如下所示)时,出现以下错误:
pandas.core.base.DataError: No numeric types to aggregate
My dataframe:
我的数据框:
bank diff
Bank of Japan 0 days 00:00:57.416000
Reserve Bank of Australia 0 days 00:00:21.452000
Reserve Bank of New Zealand 55 days 12:39:32.269000
U.S. Federal Reserve 8 days 13:27:11.387000
My code:
我的代码:
means = dropped.groupby('bank').mean()
std = dropped.groupby('bank').std()
采纳答案by jezrael
You need to convert timedelta
to some numeric value, e.g. int64
by values
what is most accurate, because convert to ns
is what is the numeric representation of timedelta
:
需要转换timedelta
到一些数值,比如int64
由values
什么是最准确的,因为皈依ns
是什么,是的数值表示timedelta
:
dropped['new'] = dropped['diff'].values.astype(np.int64)
means = dropped.groupby('bank').mean()
means['new'] = pd.to_timedelta(means['new'])
std = dropped.groupby('bank').std()
std['new'] = pd.to_timedelta(std['new'])
Another solution is to convert values to seconds
by total_seconds
, but that is less accurate:
另一种解决方案是将值转换为seconds
by total_seconds
,但这不太准确:
dropped['new'] = dropped['diff'].dt.total_seconds()
means = dropped.groupby('bank').mean()
回答by Wesam
No need to convert timedelta
back and forth. Numpy and pandas can seamlessly do it for you with a faster run time. Using your dropped
DataFrame
:
无需timedelta
来回转换。Numpy 和 Pandas 可以以更快的运行时间无缝地为您完成。使用您的dropped
DataFrame
:
import numpy as np
grouped = dropped.groupby('bank')['diff']
mean = grouped.apply(lambda x: np.mean(x))
std = grouped.apply(lambda x: np.std(x))
回答by Alexander Usikov
Pandas mean()
and other aggregation methods support numeric_only=False
parameter.
Pandasmean()
和其他聚合方法支持numeric_only=False
参数。
dropped.groupby('bank').mean(numeric_only=False)
Found here: Aggregations for Timedelta values in the Python DataFrame
回答by Cor
I would suggest passing the numeric_only=False
argument to mean
as mentioned by Alexander Usikov - this works for pandas version 0.20+.
我建议将numeric_only=False
参数传递给mean
Alexander Usikov 提到的 - 这适用于 0.20+ 版的Pandas。
If you have an older version, the following works:
如果您有旧版本,则以下操作有效:
import pandas pd
df = pd.DataFrame({
'td': pd.Series([pd.Timedelta(days=i) for i in range(5)]),
'group': ['a', 'a', 'a', 'b', 'b']
})
(
df
.astype({'td': int}) # convert timedelta to integer (nanoseconds)
.groupby('group')
.mean()
.astype({'td': 'timedelta64[ns]'})
)