Python 从 Pandas 日期时间列中分别提取月份和年份
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25146121/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extracting just Month and Year separately from Pandas Datetime column
提问by monkeybiz7
I have a Dataframe, df, with the following column:
我有一个数据框 df,其中包含以下列:
df['ArrivalDate'] =
...
936 2012-12-31
938 2012-12-29
965 2012-12-31
966 2012-12-31
967 2012-12-31
968 2012-12-31
969 2012-12-31
970 2012-12-29
971 2012-12-31
972 2012-12-29
973 2012-12-29
...
The elements of the column are pandas.tslib.Timestamp.
该列的元素是pandas.tslib.Timestamp。
I want to just include the year and month. I thought there would be simple way to do it, but I can't figure it out.
我只想包括年和月。我以为会有简单的方法来做到这一点,但我无法弄清楚。
Here's what I've tried:
这是我尝试过的:
df['ArrivalDate'].resample('M', how = 'mean')
I got the following error:
我收到以下错误:
Only valid with DatetimeIndex or PeriodIndex
Then I tried:
然后我尝试:
df['ArrivalDate'].apply(lambda(x):x[:-2])
I got the following error:
我收到以下错误:
'Timestamp' object has no attribute '__getitem__'
Any suggestions?
有什么建议?
Edit: I sort of figured it out.
编辑:我有点想通了。
df.index = df['ArrivalDate']
Then, I can resample another column using the index.
然后,我可以使用索引重新采样另一列。
But I'd still like a method for reconfiguring the entire column. Any ideas?
但我仍然想要一种重新配置整个列的方法。有任何想法吗?
采纳答案by ely
You can directly access the yearand monthattributes, or request a datetime.datetime:
您可以直接访问year和month属性,或请求一个datetime.datetime:
In [15]: t = pandas.tslib.Timestamp.now()
In [16]: t
Out[16]: Timestamp('2014-08-05 14:49:39.643701', tz=None)
In [17]: t.to_pydatetime() #datetime method is deprecated
Out[17]: datetime.datetime(2014, 8, 5, 14, 49, 39, 643701)
In [18]: t.day
Out[18]: 5
In [19]: t.month
Out[19]: 8
In [20]: t.year
Out[20]: 2014
One way to combine year and month is to make an integer encoding them, such as: 201408for August, 2014. Along a whole column, you could do this as:
组合年和月的一种方法是对它们进行整数编码,例如:201408对于 2014 年 8 月。沿着整列,您可以这样做:
df['YearMonth'] = df['ArrivalDate'].map(lambda x: 100*x.year + x.month)
or many variants thereof.
或其许多变体。
I'm not a big fan of doing this, though, since it makes date alignment and arithmetic painful later and especially painful for others who come upon your code or data without this same convention. A better way is to choose a day-of-month convention, such as final non-US-holiday weekday, or first day, etc., and leave the data in a date/time format with the chosen date convention.
不过,我不是这样做的忠实粉丝,因为它使日期对齐和算术在以后变得痛苦,并且对于那些没有相同约定的代码或数据的其他人来说尤其痛苦。更好的方法是选择月中的某天约定,例如最后的非美国假日工作日或第一天等,并将数据保留为具有所选日期约定的日期/时间格式。
The calendarmodule is useful for obtaining the number value of certain days such as the final weekday. Then you could do something like:
该calendar模块可用于获取某些天的数值,例如最后一个工作日。然后你可以做这样的事情:
import calendar
import datetime
df['AdjustedDateToEndOfMonth'] = df['ArrivalDate'].map(
lambda x: datetime.datetime(
x.year,
x.month,
max(calendar.monthcalendar(x.year, x.month)[-1][:5])
)
)
If you happen to be looking for a way to solve the simpler problem of just formatting the datetime column into some stringified representation, for that you can just make use of the strftimefunction from the datetime.datetimeclass, like this:
如果您碰巧正在寻找一种方法来解决将日期时间列格式化为某种字符串化表示的更简单的问题,为此您可以使用类中的strftime函数datetime.datetime,如下所示:
In [5]: df
Out[5]:
date_time
0 2014-10-17 22:00:03
In [6]: df.date_time
Out[6]:
0 2014-10-17 22:00:03
Name: date_time, dtype: datetime64[ns]
In [7]: df.date_time.map(lambda x: x.strftime('%Y-%m-%d'))
Out[7]:
0 2014-10-17
Name: date_time, dtype: object
回答by KieranPC
If you want new columns showing year and month separately you can do this:
如果您希望新列分别显示年和月,您可以这样做:
df['year'] = pd.DatetimeIndex(df['ArrivalDate']).year
df['month'] = pd.DatetimeIndex(df['ArrivalDate']).month
or...
或者...
df['year'] = df['ArrivalDate'].dt.year
df['month'] = df['ArrivalDate'].dt.month
Then you can combine them or work with them just as they are.
然后,您可以将它们组合起来或按原样使用它们。
回答by PankajKabra
If you want the month year unique pair, using apply is pretty sleek.
如果你想要月年独特的一对,使用 apply 非常时尚。
df['mnth_yr'] = df['date_column'].apply(lambda x: x.strftime('%B-%Y'))
Outputs month-year in one column.
在一列中输出月-年。
Don't forget to first change the format to date-time before, I generally forget.
之前别忘了先把格式改成date-time,我一般都忘记了。
df['date_column'] = pd.to_datetime(df['date_column'])
回答by TICH
df['year_month']=df.datetime_column.apply(lambda x: str(x)[:7])
This worked fine for me, didn't think pandas would interpret the resultant string date as date, but when i did the plot, it knew very well my agenda and the string year_month where ordered properly... gotta love pandas!
这对我来说很好用,没想到大熊猫会将结果字符串日期解释为日期,但是当我进行绘图时,它非常了解我的议程和正确排序的字符串 year_month ......一定要爱大熊猫!
回答by Juan A. Navarro
You can first convert your date strings with pandas.to_datetime, which gives you access to all of the numpy datetime and timedeltafacilities. For example:
您可以首先使用pandas.to_datetime转换日期字符串,这使您可以访问所有numpy datetime 和 timedelta设施。例如:
df['ArrivalDate'] = pandas.to_datetime(df['ArrivalDate'])
df['Month'] = df['ArrivalDate'].values.astype('datetime64[M]')
回答by Subspacian
回答by PankajKabra
Best way found!!
找到最好的方法!!
the df['date_column']has to be in date time format.
的df['date_column']必须是日期时间格式。
df['month_year'] = df['date_column'].dt.to_period('M')
You could also use Dfor Day, 2Mfor 2 Months etc. for different sampling intervals, and in case one has time series data with time stamp, we can go for granular sampling intervals such as 45Minfor 45 min, 15Minfor 15 min sampling etc.
您还可以D将 Day、2M2 Months 等用于不同的采样间隔,如果有带时间戳的时间序列数据,我们可以采用粒度采样间隔,例如45Min45 分钟、15Min15 分钟采样等。
回答by Douglas
Extracting the Year say from ['2018-03-04']
从 ['2018-03-04'] 中提取年份说
df['Year'] = pd.DatetimeIndex(df['date']).year
The df['Year'] creates a new column. While if you want to extract the month just use .month
df['Year'] 创建一个新列。而如果你想提取月份,只需使用 .month
回答by jpp
@KieranPC's solutionis the correct approach for Pandas, but is not easily extendible for arbitrary attributes. For this, you can use getattrwithin a generator comprehension and combine using pd.concat:
@KieranPC 的解决方案是 Pandas 的正确方法,但对于任意属性不容易扩展。为此,您可以getattr在生成器理解中使用并结合使用pd.concat:
# input data
list_of_dates = ['2012-12-31', '2012-12-29', '2012-12-30']
df = pd.DataFrame({'ArrivalDate': pd.to_datetime(list_of_dates)})
# define list of attributes required
L = ['year', 'month', 'day', 'dayofweek', 'dayofyear', 'weekofyear', 'quarter']
# define generator expression of series, one for each attribute
date_gen = (getattr(df['ArrivalDate'].dt, i).rename(i) for i in L)
# concatenate results and join to original dataframe
df = df.join(pd.concat(date_gen, axis=1))
print(df)
ArrivalDate year month day dayofweek dayofyear weekofyear quarter
0 2012-12-31 2012 12 31 0 366 1 4
1 2012-12-29 2012 12 29 5 364 52 4
2 2012-12-30 2012 12 30 6 365 52 4
回答by abdellah el atouani
There is two steps to extract year for all the dataframe without using method apply.
有两个步骤可以在不使用方法应用的情况下为所有数据帧提取年份。
Step1
第1步
convert the column to datetime :
将列转换为日期时间:
df['ArrivalDate']=pd.to_datetime(df['ArrivalDate'], format='%Y-%m-%d')
Step2
第2步
extract the year or the month using DatetimeIndex()method
使用DatetimeIndex()方法提取年或月
pd.DatetimeIndex(df['ArrivalDate']).year

