Python 从给定日期提取日、月和年的最快方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21954197/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:57:12  来源:igfitidea点击:

Which is the fastest way to extract day, month and year from a given date?

pythonpandasdatedatetimeaccessor

提问by ram

I read a csv file containing 150,000 lines into a pandas dataframe. This dataframe has a field, Date, with the dates in yyyy-mm-ddformat. I want to extract the month, day and year from it and copy into the dataframes' columns, Month, Dayand Yearrespectively. For a few hundred records the below two methods work ok, but for 150,000 records both take a ridiculously long time to execute. Is there a faster way to do this for 100,000+ records?

我将一个包含 150,000 行的 csv 文件读入熊猫数据帧。这个数据框有一个字段,Date,日期yyyy-mm-dd格式。我想从中提取月、日和年MonthDayYear分别复制到数据框的列中。对于几百条记录,以下两种方法可以正常工作,但是对于 150,000 条记录,它们都需要非常长的时间来执行。有没有更快的方法来处理 100,000 多条记录?

First method:

第一种方法:

df = pandas.read_csv(filename)
for i in xrange(len(df)): 
   df.loc[i,'Day'] = int(df.loc[i,'Date'].split('-')[2])

Second method:

第二种方法:

df = pandas.read_csv(filename)
for i in xrange(len(df)):
   df.loc[i,'Day'] = datetime.strptime(df.loc[i,'Date'], '%Y-%m-%d').day

Thank you.

谢谢你。

采纳答案by Jeff

In 0.15.0 you will be able to use the new .dt accessor to do this nice syntactically.

在 0.15.0 中,您将能够使用新的 .dt 访问器在语法上做到这一点。

In [36]: df = DataFrame(date_range('20000101',periods=150000,freq='H'),columns=['Date'])

In [37]: df.head(5)
Out[37]: 
                 Date
0 2000-01-01 00:00:00
1 2000-01-01 01:00:00
2 2000-01-01 02:00:00
3 2000-01-01 03:00:00
4 2000-01-01 04:00:00

[5 rows x 1 columns]

In [38]: %timeit f(df)
10 loops, best of 3: 22 ms per loop

In [39]: def f(df):
    df = df.copy()
    df['Year'] = DatetimeIndex(df['Date']).year
    df['Month'] = DatetimeIndex(df['Date']).month
    df['Day'] = DatetimeIndex(df['Date']).day
    return df
   ....: 

In [40]: f(df).head()
Out[40]: 
                 Date  Year  Month  Day
0 2000-01-01 00:00:00  2000      1    1
1 2000-01-01 01:00:00  2000      1    1
2 2000-01-01 02:00:00  2000      1    1
3 2000-01-01 03:00:00  2000      1    1
4 2000-01-01 04:00:00  2000      1    1

[5 rows x 4 columns]

From 0.15.0 on (release in end of Sept 2014), the following is now possible with the new .dt accessor:

从 0.15.0 开始(2014 年 9 月末发布),现在可以使用新的 .dt 访问器执行以下操作:

df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day

回答by Nim J

I use below code which works very well for me

我使用下面的代码,这对我来说效果很好

df['Year']=[d.split('-')[0] for d in df.Date]
df['Month']=[d.split('-')[1] for d in df.Date]
df['Day']=[d.split('-')[2] for d in df.Date]

df.head(5)