Python Pandas：根据时间范围删除时间序列的行

Question

提问by jim basquiat

I have the following timeserie:

我有以下时间序列：

start = pd.to_datetime('2016-1-1')
end = pd.to_datetime('2016-1-15')
rng = pd.date_range(start, end, freq='2h')
df = pd.DataFrame({'timestamp': rng, 'values': np.random.randint(0,100,len(rng))})  
df = df.set_index(['timestamp'])

I would like to drop the rows that are between those 2 timestamps:

我想删除这两个时间戳之间的行：

start_remove = pd.to_datetime('2016-1-4')
end_remove = pd.to_datetime('2016-1-8')

How can I do that?

我怎样才能做到这一点？

Answer 1

回答by piRSquared

using query

使用 query

df.query('index < @start_remove or index > @end_remove')

using loc

使用 loc

df.loc[(df.index < start_remove) | (df.index > end_remove)]

using date slicing

使用日期切片

This includes the end points

这包括终点

pd.concat([df[:start_remove], df[end_remove:]])

And without the end points

而且没有终点

pd.concat([df[:start_remove], df[end_remove:]]).drop([start_remove, end_remove])

Answer 2

回答by ade1e

Another one to try. Exclude the dates in the date_range:

另一个尝试。排除中的日期date_range：

Edit: Added frequency to date_range. This is now the same as original data.

编辑：将频率添加到date_range. 这现在与原始数据相同。

dropThis = pd.date_range(start_remove,end_remove,freq='2h')
df[~df.index.isin(dropThis)]

We can see the rows are now dropped.

我们可以看到行现在被删除了。

len(df)
169

len(df[~pd.to_datetime(df.index).isin(dropThis)])
120

Answer 3

回答by EdChum

An obscure method is to use slice_indexeron your index by passing your start and end range, this will return a Sliceobject which you can use to index into your original index and then negate the values using isin:

一个晦涩的方法是slice_indexer通过传递开始和结束范围在索引上使用，这将返回一个Slice对象，您可以使用该对象索引原始索引，然后使用isin以下方法否定值：

In [20]:
df.loc[~df.index.isin(df.index[df.index.slice_indexer(start_remove, end_remove)])]

Out[20]:
                     values
timestamp                  
2016-01-01 00:00:00       0
2016-01-01 02:00:00      57
2016-01-01 04:00:00      98
2016-01-01 06:00:00      82
2016-01-01 08:00:00      24
2016-01-01 10:00:00       1
2016-01-01 12:00:00      41
2016-01-01 14:00:00      14
2016-01-01 16:00:00      40
2016-01-01 18:00:00      48
2016-01-01 20:00:00      77
2016-01-01 22:00:00      34
2016-01-02 00:00:00      88
2016-01-02 02:00:00      58
2016-01-02 04:00:00      72
2016-01-02 06:00:00      24
2016-01-02 08:00:00      32
2016-01-02 10:00:00      44
2016-01-02 12:00:00      57
2016-01-02 14:00:00      88
2016-01-02 16:00:00      97
2016-01-02 18:00:00      75
2016-01-02 20:00:00      46
2016-01-02 22:00:00      31
2016-01-03 00:00:00      60
2016-01-03 02:00:00      73
2016-01-03 04:00:00      79
2016-01-03 06:00:00      71
2016-01-03 08:00:00      53
2016-01-03 10:00:00      70
...                     ...
2016-01-12 14:00:00       5
2016-01-12 16:00:00      42
2016-01-12 18:00:00      17
2016-01-12 20:00:00      94
2016-01-12 22:00:00      63
2016-01-13 00:00:00      63
2016-01-13 02:00:00      50
2016-01-13 04:00:00      44
2016-01-13 06:00:00      35
2016-01-13 08:00:00      59
2016-01-13 10:00:00      53
2016-01-13 12:00:00      16
2016-01-13 14:00:00      68
2016-01-13 16:00:00      66
2016-01-13 18:00:00      56
2016-01-13 20:00:00      18
2016-01-13 22:00:00      59
2016-01-14 00:00:00       8
2016-01-14 02:00:00      60
2016-01-14 04:00:00      52
2016-01-14 06:00:00      87
2016-01-14 08:00:00      31
2016-01-14 10:00:00      91
2016-01-14 12:00:00      64
2016-01-14 14:00:00      53
2016-01-14 16:00:00      47
2016-01-14 18:00:00      87
2016-01-14 20:00:00      47
2016-01-14 22:00:00      27
2016-01-15 00:00:00      28

[120 rows x 1 columns]

Here you can see that 49 rows were removed from the original df

在这里您可以看到从原始 df 中删除了 49 行

In [23]:
df.index.slice_indexer(start_remove, end_remove)

Out[23]:
slice(36, 85, None)

In [24]:
df.index.isin(df.index[df.index.slice_indexer(start_remove, end_remove)])

Out[24]:
array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True, False, False, False, False, False,
       ........
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False], dtype=bool)

and then invert the above using ~

然后使用反转上面的 ~

EditActually you can achieve this without isin:

编辑实际上你可以在没有isin：

df.loc[df.index.difference(df.index[df.index.slice_indexer(start_remove, end_remove)])]

will also work.

也会起作用。

Timings

时间安排

Interestingly this is also the fastest method:

有趣的是，这也是最快的方法：

In [30]:
%timeit df.loc[df.index.difference(df.index[df.index.slice_indexer(start_remove, end_remove)])]

100 loops, best of 3: 4.05 ms per loop

In [31]:    
%timeit df.query('index < @start_remove or index > @end_remove')

10 loops, best of 3: 15.2 ms per loop

In [32]:    
%timeit df.loc[(df.index < start_remove) | (df.index > end_remove)]

100 loops, best of 3: 4.94 ms per loop

Answer 4

回答by Joe Heffer

df = df.drop(pd.date_range('2018-01-01', '2018-02-01')), errors='ignore')

Python Pandas：根据时间范围删除时间序列的行

提问by jim basquiat

回答by piRSquared

回答by ade1e

回答by EdChum

回答by Joe Heffer

相关推荐

最近更新

标签

Python Pandas：根据时间范围删除时间序列的行

提问by jim basquiat

回答by piRSquared

回答by ade1e

回答by EdChum

回答by Joe Heffer

相关推荐

pandas 在熊猫列中格式化季度时间

Pandas 中的数据透视表小计

从具有特定模式的 txt 文件创建 Pandas DataFrame

Pandas，Pivot 错误 - 无法使用空键标记索引

相关推荐

最近更新

标签