Python 从 Pandas 日期时间列中分别提取月份和年份

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25146121/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:47:37  来源:igfitidea点击:

Extracting just Month and Year separately from Pandas Datetime column

pythonpandas

提问by monkeybiz7

I have a Dataframe, df, with the following column:

我有一个数据框 df,其中包含以下列:

df['ArrivalDate'] =
...
936   2012-12-31
938   2012-12-29
965   2012-12-31
966   2012-12-31
967   2012-12-31
968   2012-12-31
969   2012-12-31
970   2012-12-29
971   2012-12-31
972   2012-12-29
973   2012-12-29
...

The elements of the column are pandas.tslib.Timestamp.

该列的元素是pandas.tslib.Timestamp。

I want to just include the year and month. I thought there would be simple way to do it, but I can't figure it out.

我只想包括年和月。我以为会有简单的方法来做到这一点,但我无法弄清楚。

Here's what I've tried:

这是我尝试过的:

df['ArrivalDate'].resample('M', how = 'mean')

I got the following error:

我收到以下错误:

Only valid with DatetimeIndex or PeriodIndex 

Then I tried:

然后我尝试:

df['ArrivalDate'].apply(lambda(x):x[:-2])

I got the following error:

我收到以下错误:

'Timestamp' object has no attribute '__getitem__' 

Any suggestions?

有什么建议?

Edit: I sort of figured it out.

编辑:我有点想通了。

df.index = df['ArrivalDate']

Then, I can resample another column using the index.

然后,我可以使用索引重新采样另一列。

But I'd still like a method for reconfiguring the entire column. Any ideas?

但我仍然想要一种重新配置整个列的方法。有任何想法吗?

采纳答案by ely

You can directly access the yearand monthattributes, or request a datetime.datetime:

您可以直接访问yearmonth属性,或请求一个datetime.datetime

In [15]: t = pandas.tslib.Timestamp.now()

In [16]: t
Out[16]: Timestamp('2014-08-05 14:49:39.643701', tz=None)

In [17]: t.to_pydatetime() #datetime method is deprecated
Out[17]: datetime.datetime(2014, 8, 5, 14, 49, 39, 643701)

In [18]: t.day
Out[18]: 5

In [19]: t.month
Out[19]: 8

In [20]: t.year
Out[20]: 2014

One way to combine year and month is to make an integer encoding them, such as: 201408for August, 2014. Along a whole column, you could do this as:

组合年和月的一种方法是对它们进行整数编码,例如:201408对于 2014 年 8 月。沿着整列,您可以这样做:

df['YearMonth'] = df['ArrivalDate'].map(lambda x: 100*x.year + x.month)

or many variants thereof.

或其许多变体。

I'm not a big fan of doing this, though, since it makes date alignment and arithmetic painful later and especially painful for others who come upon your code or data without this same convention. A better way is to choose a day-of-month convention, such as final non-US-holiday weekday, or first day, etc., and leave the data in a date/time format with the chosen date convention.

不过,我不是这样做的忠实粉丝,因为它使日期对齐和算术在以后变得痛苦,并且对于那些没有相同约定的代码或数据的其他人来说尤其痛苦。更好的方法是选择月中的某天约定,例如最后的非美国假日工作日或第一天等,并将数据保留为具有所选日期约定的日期/时间格式。

The calendarmodule is useful for obtaining the number value of certain days such as the final weekday. Then you could do something like:

calendar模块可用于获取某些天的数值,例如最后一个工作日。然后你可以做这样的事情:

import calendar
import datetime
df['AdjustedDateToEndOfMonth'] = df['ArrivalDate'].map(
    lambda x: datetime.datetime(
        x.year,
        x.month,
        max(calendar.monthcalendar(x.year, x.month)[-1][:5])
    )
)

If you happen to be looking for a way to solve the simpler problem of just formatting the datetime column into some stringified representation, for that you can just make use of the strftimefunction from the datetime.datetimeclass, like this:

如果您碰巧正在寻找一种方法来解决将日期时间列格式化为某种字符串化表示的更简单的问题,为此您可以使用类中的strftime函数datetime.datetime,如下所示:

In [5]: df
Out[5]: 
            date_time
0 2014-10-17 22:00:03

In [6]: df.date_time
Out[6]: 
0   2014-10-17 22:00:03
Name: date_time, dtype: datetime64[ns]

In [7]: df.date_time.map(lambda x: x.strftime('%Y-%m-%d'))
Out[7]: 
0    2014-10-17
Name: date_time, dtype: object

回答by KieranPC

If you want new columns showing year and month separately you can do this:

如果您希望新列分别显示年和月,您可以这样做:

df['year'] = pd.DatetimeIndex(df['ArrivalDate']).year
df['month'] = pd.DatetimeIndex(df['ArrivalDate']).month

or...

或者...

df['year'] = df['ArrivalDate'].dt.year
df['month'] = df['ArrivalDate'].dt.month

Then you can combine them or work with them just as they are.

然后,您可以将它们组合起来或按原样使用它们。

回答by PankajKabra

If you want the month year unique pair, using apply is pretty sleek.

如果你想要月年独特的一对,使用 apply 非常时尚。

df['mnth_yr'] = df['date_column'].apply(lambda x: x.strftime('%B-%Y')) 

Outputs month-year in one column.

在一列中输出月-年。

Don't forget to first change the format to date-time before, I generally forget.

之前别忘了先把格式改成date-time,我一般都忘记了。

df['date_column'] = pd.to_datetime(df['date_column'])

回答by TICH

df['year_month']=df.datetime_column.apply(lambda x: str(x)[:7])

This worked fine for me, didn't think pandas would interpret the resultant string date as date, but when i did the plot, it knew very well my agenda and the string year_month where ordered properly... gotta love pandas!

这对我来说很好用,没想到大熊猫会将结果字符串日期解释为日期,但是当我进行绘图时,它非常了解我的议程和正确排序的字符串 year_month ......一定要爱大熊猫!

回答by Juan A. Navarro

You can first convert your date strings with pandas.to_datetime, which gives you access to all of the numpy datetime and timedeltafacilities. For example:

您可以首先使用pandas.to_datetime转换日期字符串,这使您可以访问所有numpy datetime 和 timedelta设施。例如:

df['ArrivalDate'] = pandas.to_datetime(df['ArrivalDate'])
df['Month'] = df['ArrivalDate'].values.astype('datetime64[M]')

回答by Subspacian

Thanks to jaknap32, I wanted to aggregate the results according to Year and Month, so this worked:

感谢jaknap32,我想根据年和月汇总结果,所以这有效:

df_join['YearMonth'] = df_join['timestamp'].apply(lambda x:x.strftime('%Y%m'))

Output was neat:

输出很整洁:

0    201108
1    201108
2    201108

回答by PankajKabra

Best way found!!

找到最好的方法!!

the df['date_column']has to be in date time format.

df['date_column']必须是日期时间格式。

df['month_year'] = df['date_column'].dt.to_period('M')

You could also use Dfor Day, 2Mfor 2 Months etc. for different sampling intervals, and in case one has time series data with time stamp, we can go for granular sampling intervals such as 45Minfor 45 min, 15Minfor 15 min sampling etc.

您还可以D将 Day、2M2 Months 等用于不同的采样间隔,如果有带时间戳的时间序列数据,我们可以采用粒度采样间隔,例如45Min45 分钟、15Min15 分钟采样等。

回答by Douglas

Extracting the Year say from ['2018-03-04']

从 ['2018-03-04'] 中提取年份说

df['Year'] = pd.DatetimeIndex(df['date']).year  

The df['Year'] creates a new column. While if you want to extract the month just use .month

df['Year'] 创建一个新列。而如果你想提取月份,只需使用 .month

回答by jpp

@KieranPC's solutionis the correct approach for Pandas, but is not easily extendible for arbitrary attributes. For this, you can use getattrwithin a generator comprehension and combine using pd.concat:

@KieranPC 的解决方案是 Pandas 的正确方法,但对于任意属性不容易扩展。为此,您可以getattr在生成器理解中使用并结合使用pd.concat

# input data
list_of_dates = ['2012-12-31', '2012-12-29', '2012-12-30']
df = pd.DataFrame({'ArrivalDate': pd.to_datetime(list_of_dates)})

# define list of attributes required    
L = ['year', 'month', 'day', 'dayofweek', 'dayofyear', 'weekofyear', 'quarter']

# define generator expression of series, one for each attribute
date_gen = (getattr(df['ArrivalDate'].dt, i).rename(i) for i in L)

# concatenate results and join to original dataframe
df = df.join(pd.concat(date_gen, axis=1))

print(df)

  ArrivalDate  year  month  day  dayofweek  dayofyear  weekofyear  quarter
0  2012-12-31  2012     12   31          0        366           1        4
1  2012-12-29  2012     12   29          5        364          52        4
2  2012-12-30  2012     12   30          6        365          52        4

回答by abdellah el atouani

There is two steps to extract year for all the dataframe without using method apply.

有两个步骤可以在不使用方法应用的情况下为所有数据帧提取年份。

Step1

第1步

convert the column to datetime :

将列转换为日期时间:

df['ArrivalDate']=pd.to_datetime(df['ArrivalDate'], format='%Y-%m-%d')

Step2

第2步

extract the year or the month using DatetimeIndex()method

使用DatetimeIndex()方法提取年或月

 pd.DatetimeIndex(df['ArrivalDate']).year