Python 提取熊猫中日期时间类型列的第一天

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45304531/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:53:51  来源:igfitidea点击:

Extracting the first day of month of a datetime type column in pandas

pythonpandasdataframedatetime64

提问by chessosapiens

I have the following dataframe:

我有以下数据框:

user_id    purchase_date 
  1        2015-01-23 14:05:21
  2        2015-02-05 05:07:30
  3        2015-02-18 17:08:51
  4        2015-03-21 17:07:30
  5        2015-03-11 18:32:56
  6        2015-03-03 11:02:30

and purchase_dateis a datetime64[ns]column. I need to add a new column df[month]that contains first day of the month of the purchase date:

并且purchase_date是一datetime64[ns]列。我需要添加一个df[month]包含购买日期月份第一天的新列:

df['month']
2015-01-01
2015-02-01
2015-02-01
2015-03-01
2015-03-01
2015-03-01

I'm looking for something like DATE_FORMAT(purchase_date, "%Y-%m-01") min SQL. I have tried the following code:

我正在寻找类似DATE_FORMAT(purchase_date, "%Y-%m-01") mSQL 的东西。我尝试了以下代码:

     df['month']=df['purchase_date'].apply(lambda x : x.replace(day=1))

It works somehow but returns: 2015-01-01 14:05:21.

它以某种方式工作但返回:2015-01-01 14:05:21

回答by jezrael

Simpliest and fastest is convert to numpy arrayby valuesand then cast:

最简单和最快的是转换为numpy arraybyvalues然后转换:

df['month'] = df['purchase_date'].values.astype('datetime64[M]')
print (df)
   user_id       purchase_date      month
0        1 2015-01-23 14:05:21 2015-01-01
1        2 2015-02-05 05:07:30 2015-02-01
2        3 2015-02-18 17:08:51 2015-02-01
3        4 2015-03-21 17:07:30 2015-03-01
4        5 2015-03-11 18:32:56 2015-03-01
5        6 2015-03-03 11:02:30 2015-03-01

Another solution with floorand pd.offsets.MonthBegin(0):

使用floor和的另一种解决方案pd.offsets.MonthBegin(0)

df['month'] = df['purchase_date'].dt.floor('d') - pd.offsets.MonthBegin(1)
print (df)
   user_id       purchase_date      month
0        1 2015-01-23 14:05:21 2015-01-01
1        2 2015-02-05 05:07:30 2015-02-01
2        3 2015-02-18 17:08:51 2015-02-01
3        4 2015-03-21 17:07:30 2015-03-01
4        5 2015-03-11 18:32:56 2015-03-01
5        6 2015-03-03 11:02:30 2015-03-01


df['month'] = (df['purchase_date'] - pd.offsets.MonthBegin(1)).dt.floor('d')
print (df)
   user_id       purchase_date      month
0        1 2015-01-23 14:05:21 2015-01-01
1        2 2015-02-05 05:07:30 2015-02-01
2        3 2015-02-18 17:08:51 2015-02-01
3        4 2015-03-21 17:07:30 2015-03-01
4        5 2015-03-11 18:32:56 2015-03-01
5        6 2015-03-03 11:02:30 2015-03-01

Last solution is create month periodby to_period:

最后一个解决方案是month period通过to_period以下方式创建的:

df['month'] = df['purchase_date'].dt.to_period('M')
print (df)
   user_id       purchase_date   month
0        1 2015-01-23 14:05:21 2015-01
1        2 2015-02-05 05:07:30 2015-02
2        3 2015-02-18 17:08:51 2015-02
3        4 2015-03-21 17:07:30 2015-03
4        5 2015-03-11 18:32:56 2015-03
5        6 2015-03-03 11:02:30 2015-03

... and then to datetimesby to_timestamp, but it is a bit slowier:

...然后 to datetimesby to_timestamp,但它有点慢:

df['month'] = df['purchase_date'].dt.to_period('M').dt.to_timestamp()
print (df)
   user_id       purchase_date      month
0        1 2015-01-23 14:05:21 2015-01-01
1        2 2015-02-05 05:07:30 2015-02-01
2        3 2015-02-18 17:08:51 2015-02-01
3        4 2015-03-21 17:07:30 2015-03-01
4        5 2015-03-11 18:32:56 2015-03-01
5        6 2015-03-03 11:02:30 2015-03-01

There are many solutions, so:

有很多解决方案,所以:

Timings:

时间

rng = pd.date_range('1980-04-03 15:41:12', periods=100000, freq='20H')
df = pd.DataFrame({'purchase_date': rng})  
print (df.head())

In [300]: %timeit df['month1'] = df['purchase_date'].values.astype('datetime64[M]')
100 loops, best of 3: 9.2 ms per loop

In [301]: %timeit df['month2'] = df['purchase_date'].dt.floor('d') - pd.offsets.MonthBegin(1)
100 loops, best of 3: 15.9 ms per loop

In [302]: %timeit df['month3'] = (df['purchase_date'] - pd.offsets.MonthBegin(1)).dt.floor('d')
100 loops, best of 3: 12.8 ms per loop

In [303]: %timeit df['month4'] = df['purchase_date'].dt.to_period('M').dt.to_timestamp()
1 loop, best of 3: 399 ms per loop

#MaxU solution
In [304]: %timeit df['month5'] = df['purchase_date'].dt.normalize() - pd.offsets.MonthBegin(1)
10 loops, best of 3: 24.9 ms per loop

#MaxU solution 2
In [305]: %timeit df['month'] = df['purchase_date'] - pd.offsets.MonthBegin(1, normalize=True)
10 loops, best of 3: 28.9 ms per loop

#Wen solution
In [306]: %timeit df['month6']= pd.to_datetime(df.purchase_date.astype(str).str[0:7]+'-01')
1 loop, best of 3: 214 ms per loop

回答by MaxU

We can use date offsetin conjunction with Series.dt.normalize:

我们可以将日期偏移量Series.dt.normalize结合使用:

In [60]: df['month'] = df['purchase_date'].dt.normalize() - pd.offsets.MonthBegin(1)

In [61]: df
Out[61]:
   user_id       purchase_date      month
0        1 2015-01-23 14:05:21 2015-01-01
1        2 2015-02-05 05:07:30 2015-02-01
2        3 2015-02-18 17:08:51 2015-02-01
3        4 2015-03-21 17:07:30 2015-03-01
4        5 2015-03-11 18:32:56 2015-03-01
5        6 2015-03-03 11:02:30 2015-03-01

Or much nicer solution from @BradSolomon

或者来自@BradSolomon 的更好的解决方案

In [95]: df['month'] = df['purchase_date'] - pd.offsets.MonthBegin(1, normalize=True)

In [96]: df
Out[96]:
   user_id       purchase_date      month
0        1 2015-01-23 14:05:21 2015-01-01
1        2 2015-02-05 05:07:30 2015-02-01
2        3 2015-02-18 17:08:51 2015-02-01
3        4 2015-03-21 17:07:30 2015-03-01
4        5 2015-03-11 18:32:56 2015-03-01
5        6 2015-03-03 11:02:30 2015-03-01

回答by YOBEN_S

Try this ..

尝试这个 ..

df['month']=pd.to_datetime(df.purchase_date.astype(str).str[0:7]+'-01')

Out[187]: 
   user_id        purchase_date       month
0        1  2015-01-23 14:05:21  2015-01-01
1        2  2015-02-05 05:07:30  2015-02-01
2        3  2015-02-18 17:08:51  2015-02-01
3        4  2015-03-21 17:07:30  2015-03-01
4        5  2015-03-11 18:32:56  2015-03-01
5        6  2015-03-03 11:02:30  2015-03-01

回答by kadee

Most proposed solutions don't work for the first day of the month.

大多数提议的解决方案在当月的第一天不起作用。

Following solution works for any day of the month:

以下解决方案适用于一个月中的任何一天:

df['month'] = df['purchase_date'] + pd.offsets.MonthEnd(0) - pd.offsets.MonthBegin(normalize=True)

[EDIT]

[编辑]

Another, more readable, solution is:

另一个更具可读性的解决方案是:

from pandas.tseries.offsets import MonthBegin
df['month'] = df['purchase_date'].dt.normalize().map(MonthBegin().rollback)

Be aware not to use:

注意不要使用:

df['month'] = df['purchase_date'].map(MonthBegin(normalize=True).rollback)

because that gives incorrect results for the first day due to a bug: https://github.com/pandas-dev/pandas/issues/32616

因为由于错误导致第一天的结果不正确:https: //github.com/pandas-dev/pandas/issues/32616

回答by pomber

For me df['purchase_date'] - pd.offsets.MonthBegin(1)didn't work (it fails for the first day of the month), so I'm subtracting the days of the month like this:

对我来说df['purchase_date'] - pd.offsets.MonthBegin(1)没有用(它在一个月的第一天失败),所以我像这样减去一个月的天数:

df['purchase_date'] - pd.to_timedelta(df['purchase_date'].dt.day - 1, unit='d')

回答by mfeyx

To extract the first day of every month, you could write a little helper function that will also work if the provided date is already the first of month. The function looks like this:

要提取每个月的第一天,您可以编写一个小辅助函数,如果提供的日期已经是月的第一天,该函数也可以使用。该函数如下所示:

def first_of_month(date):
    return date + pd.offsets.MonthEnd(-1) + pd.offsets.Day(1)

You can applythis function on pd.Series:

您可以在apply以下功能上使用此功能pd.Series

df['month'] = df['purchase_date'].apply(first_of_month)

With that you will get the monthcolumn as a Timestamp. If you need a specific format, you might convert it with the strftime()method.

有了它,您将获得该month列作为Timestamp. 如果您需要特定格式,您可以使用该strftime()方法进行转换。

df['month_str'] = df['month'].dt.strftime('%Y-%m-%d')

回答by igorkf

How about this easy solution?
As purchase_dateis already in datetime64[ns]format, you can use strftimeto format the date to always have the first day of month.

这个简单的解决方案怎么样?
由于purchase_date已经采用datetime64[ns]格式,您可以使用strftime将日期格式化为始终为月份的第一天。

df['date'] = df['purchase_date'].apply(lambda x: x.strftime('%Y-%m-01'))

print(df)
 user_id   purchase_date       date
0   1   2015-01-23 14:05:21 2015-01-01
1   2   2015-02-05 05:07:30 2015-02-01
2   3   2015-02-18 17:08:51 2015-02-01
3   4   2015-03-21 17:07:30 2015-03-01
4   5   2015-03-11 18:32:56 2015-03-01
5   6   2015-03-03 11:02:30 2015-03-01

Because we used strftime, now the datecolumn is in object(string) type:

因为我们使用了strftime,现在该date列是object(字符串)类型:

print(df.dtypes)
user_id                   int64
purchase_date    datetime64[ns]
date                     object
dtype: object

Now if you want it to be in datetime64[ns], just use pd.to_datetime():

现在,如果您希望它在 中datetime64[ns],只需使用pd.to_datetime()

df['date'] = pd.to_datetime(df['date'])

print(df.dtypes)
user_id                   int64
purchase_date    datetime64[ns]
date             datetime64[ns]
dtype: object

回答by Shibu VM

@Eyal: This is what I did to get the first day of the month using pd.offsets.MonthBeginand handle the scenario where day is already first day of month.

@Eyal:这就是我使用pd.offsets.MonthBegin和处理当月的第一天已经是第一天的情况时所做的。

import datetime

from_date= pd.to_datetime('2018-12-01')

from_date = from_date - pd.offsets.MonthBegin(1, normalize=True) if not from_date.is_month_start else from_date

from_date

result: Timestamp('2018-12-01 00:00:00')

结果: Timestamp('2018-12-01 00:00:00')

from_date= pd.to_datetime('2018-12-05')

from_date = from_date - pd.offsets.MonthBegin(1, normalize=True) if not rom_date.is_month_start else from_date

from_date

result: Timestamp('2018-12-01 00:00:00')

结果: Timestamp('2018-12-01 00:00:00')