python pandas数据帧按日期条件切片

Question

提问by Rishabh Sagar

I am able to read and slice pandas dataframe using python datetime objects, however I am forced to use only existing datesin index. For example, this works:

我能够使用 python datetime 对象读取和切片 pandas 数据帧，但是我被迫仅使用索引中的现有日期。例如，这有效：

>>> data
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 252 entries, 2010-12-31 00:00:00 to 2010-04-01 00:00:00
Data columns:
Adj Close    252  non-null values
dtypes: float64(1)

>>> st = datetime.datetime(2010, 12, 31, 0, 0)
>>> en = datetime.datetime(2010, 12, 28, 0, 0)

>>> data[st:en]
            Adj Close
Date                 
2010-12-31     593.97
2010-12-30     598.86
2010-12-29     601.00
2010-12-28     598.92

However if I use a start or end date that is not present in the DF, I get python KeyError.

但是，如果我使用 DF 中不存在的开始或结束日期，则会出现 python KeyError。

My Question : How do I query the dataframe object for a date range; even when the start and end dates are not present in the DataFrame. Does pandas allow for range based slicing?

我的问题：如何查询数据框对象的日期范围；即使开始和结束日期不存在于 DataFrame 中。熊猫是否允许基于范围的切片？

I am using pandas version 0.10.1

我正在使用熊猫版本 0.10.1

Answer 1

采纳答案by waitingkuo

Use searchsortedto find the nearest times first, and then use it to slice.

先用searchsorted找最近的时间，再用它来切片。

In [15]: df = pd.DataFrame([1, 2, 3], index=[dt.datetime(2013, 1, 1), dt.datetime(2013, 1, 3), dt.datetime(2013, 1, 5)])

In [16]: df
Out[16]: 
            0
2013-01-01  1
2013-01-03  2
2013-01-05  3

In [22]: start = df.index.searchsorted(dt.datetime(2013, 1, 2))

In [23]: end = df.index.searchsorted(dt.datetime(2013, 1, 4))

In [24]: df.iloc[start:end]
Out[24]: 
            0
2013-01-03  2

Answer 2

回答by Dan Allan

Short answer: Sort your data (data.sort()) and then I think everything will work the way you are expecting.

简短回答：对您的数据进行排序 ( data.sort())，然后我认为一切都会按照您期望的方式进行。

Yes, you can slice using datetimes not present in the DataFrame. For example:

是的，您可以使用 DataFrame 中不存在的日期时间进行切片。例如：

In [12]: df
Out[12]: 
                   0
2013-04-20  1.120024
2013-04-21 -0.721101
2013-04-22  0.379392
2013-04-23  0.924535
2013-04-24  0.531902
2013-04-25 -0.957936

In [13]: df['20130419':'20130422']
Out[13]: 
                   0
2013-04-20  1.120024
2013-04-21 -0.721101
2013-04-22  0.379392

As you can see, you don't even have to build datetime objects; strings work.

如您所见，您甚至不必构建日期时间对象；字符串工作。

Because the datetimes in your index are not sequential, the behavior is weird. If we shuffle the index of my example here...

因为索引中的日期时间不是连续的，所以行为很奇怪。如果我们在这里洗牌我的例子的索引......

In [17]: df
Out[17]: 
                   0
2013-04-22  1.120024
2013-04-20 -0.721101
2013-04-24  0.379392
2013-04-23  0.924535
2013-04-21  0.531902
2013-04-25 -0.957936

...and take the same slice, we get a different result. It returns the first element inside the range and stops at the first element outside the range.

...并取相同的切片，我们得到不同的结果。它返回范围内的第一个元素并在范围外的第一个元素处停止。

In [18]: df['20130419':'20130422']
Out[18]: 
                   0
2013-04-22  1.120024
2013-04-20 -0.721101
2013-04-24  0.379392

This is probably not useful behavior. If you want to select ranges of dates, would it make sense to sort it by date first?

这可能不是有用的行为。如果您想选择日期范围，先按日期排序是否有意义？

df.sort_index()

Answer 3

回答by watsonic

You can use a simple mask to accomplish this:

您可以使用一个简单的掩码来完成此操作：

date_mask = (data.index > start) & (data.index < end)
dates = data.index[date_mask]
data.ix[dates]

By the way, this works for hierarchical indexing as well. In that case data.indexwould be replaced with data.index.levels[0]or similar.

顺便说一下，这也适用于分层索引。在这种情况下，data.index将替换为data.index.levels[0]或类似的。

Answer 4

回答by R. Cox

I had difficulty with other approaches but I found that the following approach worked for me:

我在使用其他方法时遇到了困难，但我发现以下方法对我有用：

# Set the Index to be the Date
df['Date'] = pd.to_datetime(df['Date_1'], format='%d/%m/%Y')
df.set_index('Date', inplace=True)

# Sort the Data
df = df.sort_values('Date_1')

# Slice the Data
From = '2017-05-07'
To   = '2017-06-07'
df_Z = df.loc[From:To,:]

python pandas数据帧按日期条件切片

提问by Rishabh Sagar

采纳答案by waitingkuo

回答by Dan Allan

回答by watsonic

回答by R. Cox

相关推荐

最近更新

标签

python pandas数据帧按日期条件切片

提问by Rishabh Sagar

采纳答案by waitingkuo

回答by Dan Allan

回答by watsonic

回答by R. Cox

相关推荐

在 Python 中捕获 KeyError

Python解析输入文件的行

Python - 将数组列表转换为二维数组

Python 使用 Amazon s3 boto 库，如何获取已保存密钥的 URL？

相关推荐

最近更新

标签