pandas 如何按特定月/日过滤日期数据框?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25873772/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:28:33  来源:igfitidea点击:

How to filter a dataframe of dates by a particular month/day?

pythonpandasdataframe

提问by jenny

So my code is as follows:

所以我的代码如下:

df['Dates'][df['Dates'].index.month == 11]

I was doing a test to see if I could filter the months so it only shows November dates, but this did not work. It gives me the following error: AttributeError: 'Int64Index' object has no attribute 'month'.

我正在做一个测试,看看我是否可以过滤月份,所以它只显示 11 月的日期,但这不起作用。它给了我以下错误:AttributeError: 'Int64Index' object has no attribute 'month'。

If I do

如果我做

print type(df['Dates'][0])

then I get class 'pandas.tslib.Timestamp', which leads me to believe that the types of objects stored in the dataframe are Timestamp objects. (I'm not sure where the 'Int64Index' is coming from... for the error before)

然后我得到类'pandas.tslib.Timestamp',这让我相信存储在数据帧中的对象类型是时间戳对象。(我不确定 'Int64Index' 来自哪里......之前的错误)

What I want to do is this: The dataframe column contains dates from the early 2000's to present in the following format: dd/mm/yyyy. I want to filter for dates only between November 15 and March 15, independent of the YEAR. What is the easiest way to do this?

我想要做的是:数据框列包含从 2000 年代初期到以以下格式显示的日期:dd/mm/yyyy。我只想过滤 11 月 15 日和 3 月 15 日之间的日期,与年份无关。什么是最简单的方法来做到这一点?

Thanks.

谢谢。

Here is df['Dates'] (with indices):

这是 df['Dates'] (带索引):

0    2006-01-01
1    2006-01-02
2    2006-01-03
3    2006-01-04
4    2006-01-05
5    2006-01-06
6    2006-01-07
7    2006-01-08
8    2006-01-09
9    2006-01-10
10   2006-01-11
11   2006-01-12
12   2006-01-13
13   2006-01-14
14   2006-01-15
...

回答by b10n

Map an anonymous function to calculate the month on to the series and compare it to 11 for nov. That will give you a boolean mask. You can then use that mask to filter your dataframe.

映射一个匿名函数来计算该系列的月份,并将其与 11 月份的 11 月进行比较。这会给你一个布尔掩码。然后您可以使用该掩码来过滤您的数据框。

nov_mask = df['Dates'].map(lambda x: x.month) == 11
df[nov_mask]

I don't think there is straight forward way to filter the way you want ignoring the year so try this.

我认为没有直接的方法可以过滤您想要忽略年份的方式,所以试试这个。

nov_mar_series = pd.Series(pd.date_range("2013-11-15", "2014-03-15"))
#create timestamp without year
nov_mar_no_year = nov_mar_series.map(lambda x: x.strftime("%m-%d"))
#add a yearless timestamp to the dataframe
df["no_year"] = df['Date'].map(lambda x: x.strftime("%m-%d"))
no_year_mask = df['no_year'].isin(nov_mar_no_year)
df[no_year_mask]

回答by Erfan

Using pd.to_datetime& dtaccessor

使用pd.to_datetime&dt存取器

2020 answer

2020 answer

The accepted answer is not the "pandas" way to approach this problem.

公认的答案不是解决这个问题的“Pandas”方式。

To select only rows with month 11, use the dtaccessor:

要仅选择带有 的行month 11,请使用dt访问器:

# df['Date'] = pd.to_datetime(df['Date']) -- if column is not datetime yet
df = df[df['Date'].dt.month == 11]

Same works for days or years, where you can substitute dt.monthwith dt.dayor dt.year

几天或几年都一样,你可以dt.monthdt.day或代替dt.year

Besides that, there are many more, here are a few:

除此之外,还有很多,这里有一些:

  • dt.quarter
  • dt.week
  • dt.weekday
  • dt.day_name
  • dt.is_month_end
  • dt.is_month_start
  • dt.is_year_end
  • dt.is_year_start
  • dt.quarter
  • dt.week
  • dt.weekday
  • dt.day_name
  • dt.is_month_end
  • dt.is_month_start
  • dt.is_year_end
  • dt.is_year_start

回答by Yuriy

In your code there are two issues. First, need to bring column reference after the filtering condition. Second, can either use ".month" with a column or index, but not both. One of the following should work:

在您的代码中有两个问题。首先,需要在过滤条件后带上列引用。其次,可以将“.month”与列或索引一起使用,但不能同时使用。以下方法之一应该有效:

df[df.index.month == 11]['Dates']

df[df['Dates'].month == 11]['Dates']