pandas:相当于 SQL 的 datediff() 的函数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37078370/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:11:40  来源:igfitidea点击:

pandas: function equivalent to SQL's datediff()?

pythonpandasdatediff

提问by Pythonista anonymous

Is there an equivalent to SQL's datediff function in Python's pandas? The answer to this question: Add column with number of days between dates in DataFrame pandasexplains how to calculate the difference in days. For example:

Python 的 Pandas 中是否有等效于 SQL 的 datediff 函数?这个问题的答案: 在 DataFrame pandas 中添加日期之间的天数列解释了如何计算天数差异。例如:

>>> (pd.to_datetime('15-10-2010') - pd.to_datetime('15-07-2010')) / pd.offsets.Day(1)
92.0

However, I have two questions:

但是,我有两个问题:

  1. Is there a way to calculate the difference in months? I can approximate dividing the result above by 30, by 31, but I was wondering if there is some built-in function that does this automatically.
  2. what is the syntax of pd.offsets? I tried dividing by pd.offsets.Month(1) and it doesn't work. I looked up the documentation here (which is atrocious, like all of Python's documentation!): http://pandas.pydata.org/pandas-docs/stable/timeseries.html#dateoffset-objectsbut Day() is not there, so I'm confused
  1. 有没有办法计算月份的差异?我可以近似地将上面的结果除以 30,除以 31,但我想知道是否有一些内置函数可以自动执行此操作。
  2. pd.offsets 的语法是什么?我尝试除以 pd.offsets.Month(1) 但它不起作用。我在这里查找了文档(这很糟糕,就像所有 Python 文档一样!):http: //pandas.pydata.org/pandas-docs/stable/timeseries.html#dateoffset-objects但 Day() 不在那里,所以我很困惑

采纳答案by MaxU

UPDATE:

更新:

def months_between(d1, d2):
    dd1 = min(d1, d2)
    dd2 = max(d1, d2)
    return (dd2.year - dd1.year)*12 + dd2.month - dd1.month

In [125]: months_between(pd.to_datetime('2015-01-02 12:13:14'), pd.to_datetime('2012-03-02 12:13:14'))
Out[125]: 34

OLD answer:

旧答案:

In [40]: (pd.to_datetime('15-10-2010') - pd.to_datetime('15-07-2010')).days
Out[40]: 92

you can also do this for months:

你也可以这样做几个月:

In [48]: pd.to_datetime('15-10-2010').month - pd.to_datetime('15-07-2010').month
Out[48]: 3

回答by Thanos

If you look around a little, it seems that months is not possible to get out of a TimeDelta:

如果您环顾四周,似乎无法摆脱 TimeDelta:

In [193]: date_1 = pd.to_datetime('2015-01-02 12:13:14')

In [194]: date_2 = pd.to_datetime('2012-03-02 12:13:14')

In [195]: date_1 - date_2
Out[195]: Timedelta('1036 days 00:00:00')

In [199]: td_1.
td_1.asm8            td_1.days            td_1.freq            td_1.microseconds    td_1.resolution      td_1.to_pytimedelta  td_1.value           
td_1.ceil            td_1.delta           td_1.is_populated    td_1.min             td_1.round           td_1.to_timedelta64  td_1.view            
td_1.components      td_1.floor           td_1.max             td_1.nanoseconds     td_1.seconds         td_1.total_seconds

In [199]: td_1.components
Out[199]: Components(days=1036, hours=0, minutes=0, seconds=0, milliseconds=0, microseconds=0, nanoseconds=0)

Additionally, Componentsare not offering different denominations of the same value seemingly, but

此外,Components看似不提供相同价值的不同面额,但

In [213]: td_1.components.days
Out[213]: 1036

In [214]: td_1.components.hours
Out[214]: 0

Ultimately, it seems that what you have been doing until now seems like the "best" solution:

最终,到目前为止您一直在做的事情似乎是“最佳”解决方案:

In [214]: td_1.components.days/30
Out[214]: 34.53333333333333

In [215]: np.round(td_1.components.days/30)
Out[215]: 35.0

In [216]: np.floor(td_1.components.days/30)
Out[216]: 34.0

Not the great news really, but a solution in any case.

真的不是好消息,但无论如何都是解决方案。

As to comparing the documentation that Matlab comes with to this of pandas, you are right. However, if you were to compare the price tag of the two as well maybe some questions are answered.. (?)

至于将 Matlab 附带的文档与 Pandas 的文档进行比较,您是对的。但是,如果您还要比较两者的价格标签,也许可以回答一些问题.. (?)