pandas 获取时间序列熊猫每个月的最后一个日期
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30743832/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get last date in each month of a time series pandas
提问by ikemblem
Currently I'm generating a DateTimeIndex using a certain function, zipline.utils.tradingcalendar.get_trading_days. The time series is roughly daily but with some gaps.
目前我正在使用某个函数生成 DateTimeIndex zipline.utils.tradingcalendar.get_trading_days。时间序列大致是每天,但有一些差距。
My goal is to get the last date in the DateTimeIndexfor each month.
我的目标是获得DateTimeIndex每个月的最后一个日期。
.to_period('M')& .to_timestamp('M')don't work since they give the last day of the month rather than the last value of the variable in each month.
.to_period('M')&.to_timestamp('M')不工作,因为他们给出了一个月的最后一天,而不是每个月变量的最后一个值。
As an example, if this is my time series I would want to select '2015-05-29' while the last day of the month is '2015-05-31'.
例如,如果这是我的时间序列,我想选择“2015-05-29”,而当月的最后一天是“2015-05-31”。
['2015-05-18', '2015-05-19', '2015-05-20', '2015-05-21', '2015-05-22', '2015-05-26', '2015-05-27', '2015-05-28', '2015-05-29', '2015-06-01']
['2015-05-18'、'2015-05-19'、'2015-05-20'、'2015-05-21'、'2015-05-22'、'2015-05-26'、' 2015-05-27'、'2015-05-28'、'2015-05-29'、'2015-06-01']
采纳答案by ikemblem
Condla's answer came closest to what I needed except that since my time index stretched for more than a year I needed to groupby by both month and year and then select the maximum date. Below is the code I ended up with.
Condla 的回答最接近我的需要,除了因为我的时间索引延长了一年多,我需要按月份和年份分组,然后选择最大日期。下面是我最终得到的代码。
# tempTradeDays is the initial DatetimeIndex
dateRange = []
tempYear = None
dictYears = tempTradeDays.groupby(tempTradeDays.year)
for yr in dictYears.keys():
tempYear = pd.DatetimeIndex(dictYears[yr]).groupby(pd.DatetimeIndex(dictYears[yr]).month)
for m in tempYear.keys():
dateRange.append(max(tempYear[m]))
dateRange = pd.DatetimeIndex(dateRange).order()
回答by Condla
My strategy would be to group by month and then select the "maximum" of each group:
我的策略是按月分组,然后选择每个组的“最大值”:
If "dt" is your DatetimeIndex object:
如果“dt”是您的 DatetimeIndex 对象:
last_dates_of_the_month = []
dt_month_group_dict = dt.groupby(dt.month)
for month in dt_month_group_dict:
last_date = max(dt_month_group_dict[month])
last_dates_of_the_month.append(last_date)
The list "last_date_of_the_month" contains all occuring last dates of each month in your dataset. You can use this list to create a DatetimeIndex in pandas again (or whatever you want to do with it).
列表“last_date_of_the_month”包含数据集中每个月的所有最后日期。您可以使用此列表再次在 Pandas 中创建 DatetimeIndex(或您想用它做的任何事情)。
回答by Maxim
This is an old question, but all existing answers here aren't perfect. This is the solution I came up with (assuming that date is a sorted index), which can be even written in one line, but I split it for readability:
这是一个老问题,但这里所有现有的答案都不完美。这是我想出的解决方案(假设日期是一个排序索引),它甚至可以写在一行中,但为了可读性我将其拆分:
month1 = pd.Series(apple.index.month)
month2 = pd.Series(apple.index.month).shift(-1)
mask = (month1 != month2)
apple[mask.values].head(10)
Few notes here:
这里有一些注意事项:
- Shifting a datetime series requires another
pd.Seriesinstance (see here) - Boolean mask indexing requires
.values(see here)
By the way, when the dates are the business days, it'd be easier to use resampling: apple.resample('BM')
顺便说一句,当日期是工作日时,使用重采样会更容易:apple.resample('BM')
回答by MMCM_
Maybe the answer is not needed anymore, but while searching for an answer to the same question I found maybe a simpler solution:
也许不再需要答案,但在寻找同一问题的答案时,我发现了一个更简单的解决方案:
import pandas as pd
sample_dates = pd.date_range(start='2010-01-01', periods=100, freq='B')
month_end_dates = sample_dates[sample_dates.is_month_end]
回答by user3570984
Suppose your data frame looks like this
假设您的数据框如下所示
Then the following Code will give you the last day of each month.
那么下面的代码会给你每个月的最后一天。
df_monthly = df.reset_index().groupby([df.index.year,df.index.month],as_index=False).last().set_index('index')
This one line code does its job :)
这一行代码完成了它的工作:)

