Python 按日期过滤 Pandas DataFrames

Question

提问by AMM

I have a Pandas DataFrame with a 'date' column. Now I need to filter out all rows in the DataFrame that have dates outside of the next two months. Essentially, I only need to retain the rows that are within the next two months.

我有一个带有“日期”列的 Pandas DataFrame。现在我需要过滤掉 DataFrame 中日期在接下来两个月之外的所有行。本质上，我只需要保留接下来两个月内的行。

What is the best way to achieve this?

实现这一目标的最佳方法是什么？

Answer 1

回答by Retozi

If date column is the index, then use .loc for label based indexing or .iloc for positional indexing.

如果日期列是索引，则使用 .loc 进行基于标签的索引或使用 .iloc 进行位置索引。

For example:

例如：

df.loc['2014-01-01':'2014-02-01']

See details here http://pandas.pydata.org/pandas-docs/stable/dsintro.html#indexing-selection

在此处查看详细信息http://pandas.pydata.org/pandas-docs/stable/dsintro.html#indexing-selection

If the column is not the indexyou have two choices:

如果该列不是索引，您有两种选择：

Make it the index (either temporarily or permanently if it's time-series data)
df[(df['date'] > '2013-01-01') & (df['date'] < '2013-02-01')]

使其成为索引（临时或永久，如果它是时间序列数据）
df[(df['date'] > '2013-01-01') & (df['date'] < '2013-02-01')]

See herefor the general explanation

有关一般说明，请参阅此处

Note: .ix is deprecated.

注意：.ix 已弃用。

Answer 2

回答by orange1

Previous answer is not correct in my experience, you can't pass it a simple string, needs to be a datetime object. So:

根据我的经验，以前的答案是不正确的，您不能将它传递给一个简单的字符串，而必须是一个日期时间对象。所以：

import datetime 
df.loc[datetime.date(year=2014,month=1,day=1):datetime.date(year=2014,month=2,day=1)]

Answer 3

回答by fantabolous

If the dates are in the index then simply:

如果日期在索引中，那么只需：

df['20160101':'20160301']

Answer 4

回答by shm2008

And if your dates are standardized by importing datetime package, you can simply use:

如果您的日期通过导入 datetime 包进行标准化，您可以简单地使用：

df[(df['date']>datetime.date(2016,1,1)) & (df['date']<datetime.date(2016,3,1))]

For standarding your date string using datetime package, you can use this function:

为了使用 datetime 包标准化你的日期字符串，你可以使用这个函数：

import datetime
datetime.datetime.strptime

Answer 5

回答by VMAtm

If your datetime column have the Pandas datetime type (e.g. datetime64[ns]), for proper filtering you need the pd.Timestamp object, for example:

如果您的日期时间列具有 Pandas 日期时间类型（例如datetime64[ns]），为了正确过滤您需要pd.Timestamp 对象，例如：

from datetime import date

import pandas as pd

value_to_check = pd.Timestamp(date.today().year, 1, 1)
filter_mask = df['date_column'] < value_to_check
filtered_df = df[filter_mask]

Answer 6

回答by danielhrisca

You can use pd.Timestamp to perform a query and a local reference

您可以使用 pd.Timestamp 来执行查询和本地引用

import pandas as pd
import numpy as np

df = pd.DataFrame()
ts = pd.Timestamp

df['date'] = np.array(np.arange(10) + datetime.now().timestamp(), dtype='M8[s]')

print(df)
print(df.query('date > @ts("20190515T071320")')

with the output

与输出

                 date
0 2019-05-15 07:13:16
1 2019-05-15 07:13:17
2 2019-05-15 07:13:18
3 2019-05-15 07:13:19
4 2019-05-15 07:13:20
5 2019-05-15 07:13:21
6 2019-05-15 07:13:22
7 2019-05-15 07:13:23
8 2019-05-15 07:13:24
9 2019-05-15 07:13:25


                 date
5 2019-05-15 07:13:21
6 2019-05-15 07:13:22
7 2019-05-15 07:13:23
8 2019-05-15 07:13:24
9 2019-05-15 07:13:25

Have a look at the pandas documentation for DataFrame.query, specifically the mention about the local variabile referenced udsing @prefix. In this case we reference pd.Timestampusing the local alias tsto be able to supply a timestamp string

查看DataFrame.query 的 Pandas文档，特别是关于本地变量引用的 udsing@前缀的提及。在这种情况下，我们pd.Timestamp使用本地别名ts来引用能够提供时间戳字符串

Answer 7

回答by pakira79

How about using pyjanitor

怎么用 pyjanitor

It has cool features.

它有很酷的功能。

After pip install pyjanitor

后 pip install pyjanitor

import janitor

df_filtered = df.filter_date(your_date_column_name, start_date, end_date)

Answer 8

回答by Harry

So when loading the csv data file, we'll need to set the date column as index now as below, in order to filter data based on a range of dates. This was not needed for the now deprecated method: pd.DataFrame.from_csv().

因此，在加载 csv 数据文件时，我们需要将日期列设置为索引，如下所示，以便根据日期范围过滤数据。现在已弃用的方法不需要此方法：pd.DataFrame.from_csv()。

If you just want to show the data for two months from Jan to Feb, e.g. 2020-01-01 to 2020-02-29, you can do so:

如果只想显示一月到二月这两个月的数据，比如2020-01-01到2020-02-29，你可以这样做：

import pandas as pd
mydata = pd.read_csv('mydata.csv',index_col='date') # or its index number, e.g. index_col=[0]
mydata['2020-01-01':'2020-02-29'] # will pull all the columns
#if just need one column, e.g. Cost, can be done:
mydata['2020-01-01':'2020-02-29','Cost']

This has been tested working for Python 3.7. Hope you will find this useful.

这已经过测试适用于 Python 3.7。希望你会发现这很有用。

Answer 9

回答by Ekrem Gurdal

The shortest way to filter your dataframe by date: Lets suppose your date column is type of datetime64[ns]

按日期过滤数据框的最短方法：假设您的日期列是 datetime64[ns] 类型

# filter by single day
df = df[df['date'].dt.strftime('%Y-%m-%d') == '2014-01-01']

# filter by single month
df = df[df['date'].dt.strftime('%Y-%m') == '2014-01']

# filter by single year
df = df[df['date'].dt.strftime('%Y') == '2014']

Answer 10

回答by uhetz

I'm not allowed to write any comments yet, so I'll write an answer, if somebody will read all of them and reach this one.

我还不允许写任何评论，所以我会写一个答案，如果有人会阅读所有评论并找到这个答案。

If the index of the dataset is a datetime and you want to filter that just by (for example) months, you can do following:

如果数据集的索引是日期时间，并且您只想按（例如）月份对其进行过滤，则可以执行以下操作：

df.loc[df.index.month = 3]

That will filter the dataset for you by March.

这将在 3 月之前为您过滤数据集。

Python 按日期过滤 Pandas DataFrames

提问by AMM

回答by Retozi

回答by orange1

回答by fantabolous

回答by shm2008

回答by VMAtm

回答by danielhrisca

回答by pakira79

回答by Harry

回答by Ekrem Gurdal

回答by uhetz

相关推荐

最近更新

标签

Python 按日期过滤 Pandas DataFrames

提问by AMM

回答by Retozi

回答by orange1

回答by fantabolous

回答by shm2008

回答by VMAtm

回答by danielhrisca

回答by pakira79

回答by Harry

回答by Ekrem Gurdal

回答by uhetz

相关推荐

Python 没有名为 builtins 的模块

Python 导入错误：没有名为“pymongo”的模块

Python 将浮点数转换为 hh:mm 格式

Python 没有自动增量的sqlalchemy主键

相关推荐

最近更新

标签