Python 按日期过滤 Pandas DataFrames
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22898824/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filtering Pandas DataFrames on dates
提问by AMM
I have a Pandas DataFrame with a 'date' column. Now I need to filter out all rows in the DataFrame that have dates outside of the next two months. Essentially, I only need to retain the rows that are within the next two months.
我有一个带有“日期”列的 Pandas DataFrame。现在我需要过滤掉 DataFrame 中日期在接下来两个月之外的所有行。本质上,我只需要保留接下来两个月内的行。
What is the best way to achieve this?
实现这一目标的最佳方法是什么?
回答by Retozi
If date column is the index, then use .loc for label based indexing or .iloc for positional indexing.
如果日期列是索引,则使用 .loc 进行基于标签的索引或使用 .iloc 进行位置索引。
For example:
例如:
df.loc['2014-01-01':'2014-02-01']
See details here http://pandas.pydata.org/pandas-docs/stable/dsintro.html#indexing-selection
在此处查看详细信息http://pandas.pydata.org/pandas-docs/stable/dsintro.html#indexing-selection
If the column is not the indexyou have two choices:
如果该列不是索引,您有两种选择:
- Make it the index (either temporarily or permanently if it's time-series data)
df[(df['date'] > '2013-01-01') & (df['date'] < '2013-02-01')]
- 使其成为索引(临时或永久,如果它是时间序列数据)
df[(df['date'] > '2013-01-01') & (df['date'] < '2013-02-01')]
See herefor the general explanation
有关一般说明,请参阅此处
Note: .ix is deprecated.
注意:.ix 已弃用。
回答by orange1
Previous answer is not correct in my experience, you can't pass it a simple string, needs to be a datetime object. So:
根据我的经验,以前的答案是不正确的,您不能将它传递给一个简单的字符串,而必须是一个日期时间对象。所以:
import datetime
df.loc[datetime.date(year=2014,month=1,day=1):datetime.date(year=2014,month=2,day=1)]
回答by fantabolous
If the dates are in the index then simply:
如果日期在索引中,那么只需:
df['20160101':'20160301']
回答by shm2008
And if your dates are standardized by importing datetime package, you can simply use:
如果您的日期通过导入 datetime 包进行标准化,您可以简单地使用:
df[(df['date']>datetime.date(2016,1,1)) & (df['date']<datetime.date(2016,3,1))]
For standarding your date string using datetime package, you can use this function:
为了使用 datetime 包标准化你的日期字符串,你可以使用这个函数:
import datetime
datetime.datetime.strptime
回答by VMAtm
If your datetime column have the Pandas datetime type (e.g. datetime64[ns]
), for proper filtering you need the pd.Timestamp object, for example:
如果您的日期时间列具有 Pandas 日期时间类型(例如datetime64[ns]
),为了正确过滤您需要pd.Timestamp 对象,例如:
from datetime import date
import pandas as pd
value_to_check = pd.Timestamp(date.today().year, 1, 1)
filter_mask = df['date_column'] < value_to_check
filtered_df = df[filter_mask]
回答by danielhrisca
You can use pd.Timestamp to perform a query and a local reference
您可以使用 pd.Timestamp 来执行查询和本地引用
import pandas as pd
import numpy as np
df = pd.DataFrame()
ts = pd.Timestamp
df['date'] = np.array(np.arange(10) + datetime.now().timestamp(), dtype='M8[s]')
print(df)
print(df.query('date > @ts("20190515T071320")')
with the output
与输出
date
0 2019-05-15 07:13:16
1 2019-05-15 07:13:17
2 2019-05-15 07:13:18
3 2019-05-15 07:13:19
4 2019-05-15 07:13:20
5 2019-05-15 07:13:21
6 2019-05-15 07:13:22
7 2019-05-15 07:13:23
8 2019-05-15 07:13:24
9 2019-05-15 07:13:25
date
5 2019-05-15 07:13:21
6 2019-05-15 07:13:22
7 2019-05-15 07:13:23
8 2019-05-15 07:13:24
9 2019-05-15 07:13:25
Have a look at the pandas documentation for DataFrame.query, specifically the mention about the local variabile referenced udsing @
prefix. In this case we reference pd.Timestamp
using the local alias ts
to be able to supply a timestamp string
查看DataFrame.query 的 Pandas文档,特别是关于本地变量引用的 udsing@
前缀的提及。在这种情况下,我们pd.Timestamp
使用本地别名ts
来引用能够提供时间戳字符串
回答by pakira79
How about using pyjanitor
怎么用 pyjanitor
It has cool features.
它有很酷的功能。
After pip install pyjanitor
后 pip install pyjanitor
import janitor
df_filtered = df.filter_date(your_date_column_name, start_date, end_date)
回答by Harry
So when loading the csv data file, we'll need to set the date column as index now as below, in order to filter data based on a range of dates. This was not needed for the now deprecated method: pd.DataFrame.from_csv().
因此,在加载 csv 数据文件时,我们需要将日期列设置为索引,如下所示,以便根据日期范围过滤数据。现在已弃用的方法不需要此方法:pd.DataFrame.from_csv()。
If you just want to show the data for two months from Jan to Feb, e.g. 2020-01-01 to 2020-02-29, you can do so:
如果只想显示一月到二月这两个月的数据,比如2020-01-01到2020-02-29,你可以这样做:
import pandas as pd
mydata = pd.read_csv('mydata.csv',index_col='date') # or its index number, e.g. index_col=[0]
mydata['2020-01-01':'2020-02-29'] # will pull all the columns
#if just need one column, e.g. Cost, can be done:
mydata['2020-01-01':'2020-02-29','Cost']
This has been tested working for Python 3.7. Hope you will find this useful.
这已经过测试适用于 Python 3.7。希望你会发现这很有用。
回答by Ekrem Gurdal
The shortest way to filter your dataframe by date: Lets suppose your date column is type of datetime64[ns]
按日期过滤数据框的最短方法:假设您的日期列是 datetime64[ns] 类型
# filter by single day
df = df[df['date'].dt.strftime('%Y-%m-%d') == '2014-01-01']
# filter by single month
df = df[df['date'].dt.strftime('%Y-%m') == '2014-01']
# filter by single year
df = df[df['date'].dt.strftime('%Y') == '2014']
回答by uhetz
I'm not allowed to write any comments yet, so I'll write an answer, if somebody will read all of them and reach this one.
我还不允许写任何评论,所以我会写一个答案,如果有人会阅读所有评论并找到这个答案。
If the index of the dataset is a datetime and you want to filter that just by (for example) months, you can do following:
如果数据集的索引是日期时间,并且您只想按(例如)月份对其进行过滤,则可以执行以下操作:
df.loc[df.index.month = 3]
That will filter the dataset for you by March.
这将在 3 月之前为您过滤数据集。