pandas 中日期时间索引的算术运算
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25929818/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Arithmetic operations on datetime index in pandas
提问by Fred S
In pandas, you can access specific positions of a time series either by classical integer position / row based indexing, or by datetime based indexing. The integer based index can be manipulated using basic arithmetic operations, e.g. if I have a integer_indexfor a time series with frequency 12 hours and I want to access the entry exactly one day prior to this, I can simply do integer_index - 2. However, real world data are not always perfect, and sometimes rows are missing. In this case, this method fails, and it would be helpful to be able to use datetime based indexing and subtract, for example, one dayfrom this index. How can I do this?
在 Pandas 中,您可以通过经典的整数位置/基于行的索引或基于日期时间的索引来访问时间序列的特定位置。可以使用基本算术运算来操作基于整数的索引,例如,如果我有一个integer_index频率为 12 小时的时间序列,并且我想在此之前一天访问该条目,我可以简单地执行integer_index - 2. 然而,现实世界的数据并不总是完美的,有时会丢失行。在这种情况下,此方法失败,并且能够使用基于日期时间的索引并one day从该索引中减去将很有帮助。我怎样才能做到这一点?
Sample script:
示例脚本:
# generate a sample time series
import pandas as pd
s = pd.Series(["A", "B", "C", "D", "E"], index=pd.date_range("2000-01-01", periods=5, freq="12h"))
print s
2000-01-01 00:00:00 A
2000-01-01 12:00:00 B
2000-01-02 00:00:00 C
2000-01-02 12:00:00 D
2000-01-03 00:00:00 E
Freq: 12H, dtype: object
# these to indices should access the same value ("C")
integer_index = 2
date_index = "2000-01-02 00:00"
print s[integer_index] # prints "C"
print s[date_index] # prints "C"
# I can access the value one day earlier by subtracting 2 from the integer index
print s[integer_index - 2] # prints A
# how can I subtract one day from the date index?
print s[date_index - 1] # raises an error
The background to this question can be found in an earlier submission of mine here:
这个问题的背景可以在我之前提交的这里找到:
Fill data gaps with average of data from adjacent days
where user JohnE found a workaround to my problem that uses integer position based indexing. He makes sure that I have equally spaced data by resampling the time series.
用户 JohnE 找到了解决我的问题的方法,该方法使用基于整数位置的索引。他通过重新采样时间序列来确保我拥有等距的数据。
采纳答案by Ffisegydd
Your datetime index isn't based on strings, it's a DatetimeIndexmeaning you can use datetimeobjects to index appropriately, rather than a string which lookslike a date.
您的日期时间索引不是基于字符串,这DatetimeIndex意味着您可以使用datetime对象来适当地索引,而不是看起来像日期的字符串。
The code below converts date_indexinto a datetimeobject and then uses timedelta(days=1)to subtract "one day" away from it.
下面的代码转换date_index成一个datetime对象,然后用timedelta(days=1)它减去“一天”。
# generate a sample time series
import pandas as pd
from datetime import datetime, timedelta
s = pd.Series(["A", "B", "C", "D", "E"], index=pd.date_range("2000-01-01", periods=5, freq="12h"))
print(s)
# these two indices should access the same value ("C")
integer_index = 2
# Converts the string into a datetime object
date_index = datetime.strptime("2000-01-02 00:00", "%Y-%m-%d %H:%M")
print(date_index) # 2000-01-02 00:00:00
print(s[integer_index]) # prints "C"
print(s[date_index]) # prints "C"
print(s[integer_index - 2]) # prints "A"
one_day = timedelta(days=1)
print(s[date_index - one_day]) # prints "A"
print(date_index - one_day) # 2000-01-01 00:00:00
回答by VMQ
The previousanswer by Ffisegydd is excellent, except that pandas provides an equivalent function Timedeltathat is compatible with np.timedelta64 and has a few more bells and whistles. Just replace timedelta(days=1)with pd.Timedelta(days=1)in his example to enjoy more compatibility.
Ffisegydd的上一个答案非常好,除了 Pandas 提供了一个等效的函数Timedelta,它与 np.timedelta64 兼容并且有更多的花里胡哨。只需在他的示例中替换timedelta(days=1)为pd.Timedelta(days=1)即可享受更多兼容性。

