pandas:相当于 SQL 的 datediff() 的函数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37078370/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas: function equivalent to SQL's datediff()?
提问by Pythonista anonymous
Is there an equivalent to SQL's datediff function in Python's pandas? The answer to this question: Add column with number of days between dates in DataFrame pandasexplains how to calculate the difference in days. For example:
Python 的 Pandas 中是否有等效于 SQL 的 datediff 函数?这个问题的答案: 在 DataFrame pandas 中添加日期之间的天数列解释了如何计算天数差异。例如:
>>> (pd.to_datetime('15-10-2010') - pd.to_datetime('15-07-2010')) / pd.offsets.Day(1)
92.0
However, I have two questions:
但是,我有两个问题:
- Is there a way to calculate the difference in months? I can approximate dividing the result above by 30, by 31, but I was wondering if there is some built-in function that does this automatically.
- what is the syntax of pd.offsets? I tried dividing by pd.offsets.Month(1) and it doesn't work. I looked up the documentation here (which is atrocious, like all of Python's documentation!): http://pandas.pydata.org/pandas-docs/stable/timeseries.html#dateoffset-objectsbut Day() is not there, so I'm confused
- 有没有办法计算月份的差异?我可以近似地将上面的结果除以 30,除以 31,但我想知道是否有一些内置函数可以自动执行此操作。
- pd.offsets 的语法是什么?我尝试除以 pd.offsets.Month(1) 但它不起作用。我在这里查找了文档(这很糟糕,就像所有 Python 文档一样!):http: //pandas.pydata.org/pandas-docs/stable/timeseries.html#dateoffset-objects但 Day() 不在那里,所以我很困惑
采纳答案by MaxU
UPDATE:
更新:
def months_between(d1, d2):
dd1 = min(d1, d2)
dd2 = max(d1, d2)
return (dd2.year - dd1.year)*12 + dd2.month - dd1.month
In [125]: months_between(pd.to_datetime('2015-01-02 12:13:14'), pd.to_datetime('2012-03-02 12:13:14'))
Out[125]: 34
OLD answer:
旧答案:
In [40]: (pd.to_datetime('15-10-2010') - pd.to_datetime('15-07-2010')).days
Out[40]: 92
you can also do this for months:
你也可以这样做几个月:
In [48]: pd.to_datetime('15-10-2010').month - pd.to_datetime('15-07-2010').month
Out[48]: 3
回答by Thanos
If you look around a little, it seems that months is not possible to get out of a TimeDelta:
如果您环顾四周,似乎无法摆脱 TimeDelta:
In [193]: date_1 = pd.to_datetime('2015-01-02 12:13:14')
In [194]: date_2 = pd.to_datetime('2012-03-02 12:13:14')
In [195]: date_1 - date_2
Out[195]: Timedelta('1036 days 00:00:00')
In [199]: td_1.
td_1.asm8 td_1.days td_1.freq td_1.microseconds td_1.resolution td_1.to_pytimedelta td_1.value
td_1.ceil td_1.delta td_1.is_populated td_1.min td_1.round td_1.to_timedelta64 td_1.view
td_1.components td_1.floor td_1.max td_1.nanoseconds td_1.seconds td_1.total_seconds
In [199]: td_1.components
Out[199]: Components(days=1036, hours=0, minutes=0, seconds=0, milliseconds=0, microseconds=0, nanoseconds=0)
Additionally, Components
are not offering different denominations of the same value seemingly, but
此外,Components
看似不提供相同价值的不同面额,但
In [213]: td_1.components.days
Out[213]: 1036
In [214]: td_1.components.hours
Out[214]: 0
Ultimately, it seems that what you have been doing until now seems like the "best" solution:
最终,到目前为止您一直在做的事情似乎是“最佳”解决方案:
In [214]: td_1.components.days/30
Out[214]: 34.53333333333333
In [215]: np.round(td_1.components.days/30)
Out[215]: 35.0
In [216]: np.floor(td_1.components.days/30)
Out[216]: 34.0
Not the great news really, but a solution in any case.
真的不是好消息,但无论如何都是解决方案。
As to comparing the documentation that Matlab comes with to this of pandas, you are right. However, if you were to compare the price tag of the two as well maybe some questions are answered.. (?)
至于将 Matlab 附带的文档与 Pandas 的文档进行比较,您是对的。但是,如果您还要比较两者的价格标签,也许可以回答一些问题.. (?)