pandas 带有熊猫的日期时间对象均值函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27907902/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:50:11  来源:igfitidea点击:

Datetime objects with pandas mean function

pythondatetimepandasmean

提问by Avelina

I am new to programming so I apologize in advance if this question does not make any sens. I noticed that when I try to calculate the mean value of a pandas data frame with a date time object formatted like this: datetime.datetime(2014, 7, 10), it can not calculate the mean value of it however it seems to be able to calculate the minimum and maximum value of that same data frame with out a problem.

我是编程新手,所以如果这个问题没有任何意义,我提前道歉。我注意到,当我尝试使用格式如下的日期时间对象计算Pandas数据框的平均值时:datetime.datetime(2014, 7, 10),它无法计算它的平均值,但它似乎是能够毫无问题地计算同一数据帧的最小值和最大值。

d={'one' : Series([1, 2, 3], index=['a', 'b', 'c']), 'two' :Series([datetime.datetime(2014, 7, 9) , datetime.datetime(2014, 7, 10) , datetime.datetime(2014, 7, 11) ], index=['a', 'b', 'c'])}
df=pd.DataFrame(d)

df
Out[18]: 
      one        two    
   a    1 2014-07-09
   b    2 2014-07-10
   c    3 2014-07-11

df.min()
Out[19]: 
   one             1
   two    2014-07-09
dtype: object

df.mean()
Out[20]: 
   one    2
dtype: float64

I did notice that the min and the max function converted all the columns to objects, where as the mean function only outputs floats. Could anyone explain to me why the mean function can only handle floats? Is there another way I to get the mean values of a data frame with a date time object? I can work around it by using epoch time (as integer), but it would be very convenient if there was a direct way. I use Python 2.7

我确实注意到 min 和 max 函数将所有列转换为对象,而 mean 函数仅输出浮点数。谁能向我解释为什么平均函数只能处理浮点数?还有另一种方法可以使用日期时间对象获取数据框的平均值吗?我可以通过使用纪元时间(作为整数)来解决它,但如果有直接的方法会非常方便。我使用 Python 2.7

I am grateful for any hints.

我很感激任何提示。

采纳答案by Alex

You can use datetime.timedelta

您可以使用 datetime.timedelta

import functools
import operator

d={'one' : Series([1, 2, 3], index=['a', 'b', 'c']), 'two' :Series([datetime.datetime(2014, 7, 9) , datetime.datetime(2014, 7, 10) , datetime.datetime(2014, 7, 11) ], index=['a', 'b', 'c'])}
df = pd.DataFrame(d)

def avg_datetime(series):
    dt_min = series.min()
    deltas = [x-dt_min for x in series]
    return dt_min + functools.reduce(operator.add, deltas) / len(deltas)

print(avg_datetime(df['two']))

回答by spring

To simplify Alex's answer (I would have added this as a comment but I don't have sufficient reputation):

为了简化亚历克斯的回答(我会添加这个作为评论,但我没有足够的声誉):

import datetime
import pandas as pd

d={'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two': pd.Series([datetime.datetime(2014, 7, 9), 
           datetime.datetime(2014, 7, 10), 
           datetime.datetime(2014, 7, 11) ], 
           index=['a', 'b', 'c'])}
df = pd.DataFrame(d)

Which looks like:

看起来像:

   one   two
a   1   2014-07-09
b   2   2014-07-10
c   3   2014-07-11

Then calculate the mean of column "two" by:

然后通过以下方式计算“二”列的平均值:

(df.two - df.two.min()).mean() + df.two.min()

So, subtract the min of the timeseries, calculate the mean (or median) of the resulting timedeltas, and add back the min.

因此,减去时间序列的最小值,计算生成的时间增量的平均值(或中值),然后加回最小值。

回答by Blane

This issue is sort of resolved as of pandas=0.25. However mean can only currently be applied to a datetime series and not a datetime series within a DataFrame.

这个问题从 pandas=0.25 开始就已经解决了。然而,均值目前只能应用于日期时间系列,而不能应用于 DataFrame 中的日期时间系列。

In [1]: import pandas as pd

In [2]: s = pd.Series([pd.datetime(2014, 7, 9), 
   ...:            pd.datetime(2014, 7, 10), 
   ...:            pd.datetime(2014, 7, 11)])

In [3]: s.mean()
Out[3]: Timestamp('2014-07-10 00:00:00')

Applying .mean() to a DataFrame containing a datetime series returns the same result as shown in the original question.

将 .mean() 应用于包含日期时间序列的 DataFrame 会返回与原始问题中所示相同的结果。

In [4]: df = pd.DataFrame({'numeric':[1,2,3],
   ...:               'datetime':s})

In [5]: df.mean()
Out[5]: 
numeric    2.0
dtype: float64