pandas 在pandas df中查找timedelta对象的均值和标准差

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44616546/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:49:30  来源:igfitidea点击:

Finding the mean and standard deviation of a timedelta object in pandas df

pythonpandasdatetimemeantimedelta

提问by Graham Streich

I would like to calculate the meanand standard deviationof a timedeltaby bank from a dataframewith two columns shown below. When I run the code (also shown below) I get the below error:

我想计算meanstandard deviationtimedelta银行从dataframe下面显示两列。当我运行代码(也如下所示)时,出现以下错误:

pandas.core.base.DataError: No numeric types to aggregate

My dataframe:

我的数据框:

   bank                          diff
   Bank of Japan                 0 days 00:00:57.416000
   Reserve Bank of Australia     0 days 00:00:21.452000
   Reserve Bank of New Zealand  55 days 12:39:32.269000
   U.S. Federal Reserve          8 days 13:27:11.387000

My code:

我的代码:

means = dropped.groupby('bank').mean()
std = dropped.groupby('bank').std()

采纳答案by jezrael

You need to convert timedeltato some numeric value, e.g. int64by valueswhat is most accurate, because convert to nsis what is the numeric representation of timedelta:

需要转换timedelta到一些数值,比如int64values什么是最准确的,因为皈依ns是什么,是的数值表示timedelta

dropped['new'] = dropped['diff'].values.astype(np.int64)

means = dropped.groupby('bank').mean()
means['new'] = pd.to_timedelta(means['new'])

std = dropped.groupby('bank').std()
std['new'] = pd.to_timedelta(std['new'])

Another solution is to convert values to secondsby total_seconds, but that is less accurate:

另一种解决方案是将值转换为secondsby total_seconds,但这不太准确:

dropped['new'] = dropped['diff'].dt.total_seconds()

means = dropped.groupby('bank').mean()

回答by Wesam

No need to convert timedeltaback and forth. Numpy and pandas can seamlessly do it for you with a faster run time. Using your droppedDataFrame:

无需timedelta来回转换。Numpy 和 Pandas 可以以更快的运行时间无缝地为您完成。使用您的droppedDataFrame

import numpy as np

grouped = dropped.groupby('bank')['diff']

mean = grouped.apply(lambda x: np.mean(x))
std = grouped.apply(lambda x: np.std(x))

回答by Alexander Usikov

Pandas mean()and other aggregation methods support numeric_only=Falseparameter.

Pandasmean()和其他聚合方法支持numeric_only=False参数。

dropped.groupby('bank').mean(numeric_only=False)

Found here: Aggregations for Timedelta values in the Python DataFrame

在这里找到:Python DataFrame 中 Timedelta 值的聚合

回答by Cor

I would suggest passing the numeric_only=Falseargument to meanas mentioned by Alexander Usikov - this works for pandas version 0.20+.

我建议将numeric_only=False参数传递给meanAlexander Usikov 提到的 - 这适用于 0.20+ 版的Pandas。

If you have an older version, the following works:

如果您有旧版本,则以下操作有效:

import pandas pd

df = pd.DataFrame({
    'td': pd.Series([pd.Timedelta(days=i) for i in range(5)]),
    'group': ['a', 'a', 'a', 'b', 'b']
})

(
    df
    .astype({'td': int})         # convert timedelta to integer (nanoseconds)
    .groupby('group')
    .mean()
    .astype({'td': 'timedelta64[ns]'})
)