pandas 怎么把日期改成那个月的第一个日期?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42285130/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:00:17  来源:igfitidea点击:

How floor a date to the first date of that month?

pythondatepandasnumpy

提问by John Hass

I have a pandas DataFrame with index column = date.

我有一个索引列 = 的Pandas数据帧date

Input:

输入:

            value
date    
1986-01-31  22.93
1986-02-28  15.46

I want to floor the date to the first day of that month

我想把日期定在那个月的第一天

Output:

输出:

            value
date    
1986-01-01  22.93
1986-02-01  15.46

What I tried:

我试过的:

df.index.floor('M')
ValueError: <MonthEnd> is a non-fixed frequency

This is potentially because the df is generated by df = df.resample("M").sum()(The output of this code is the input at the beginning of the question)

这可能是因为 df 是由 df = df.resample("M").sum()(此代码的输出是问题开头的输入)生成的

I also tried df = df.resample("M", convention='start').sum(). However, it does not work.

我也试过了df = df.resample("M", convention='start').sum()。但是,它不起作用。

I know in R, it is easy to just call floor(date, 'M').

我知道在 R 中,调用floor(date, 'M').

回答by Deo Leung

there is a pandas issueabout the floor problem

有一个关于地板问题的Pandas问题

the suggested way is

建议的方法是

import pandas as pd
pd.to_datetime(df.date).dt.to_period('M').dt.to_timestamp()

回答by Vaishali

You can use timeseries offset MonthBegin

您可以使用时间序列偏移 MonthBegin

from pandas.tseries.offsets import MonthBegin
df['date'] = pd.to_datetime(df['date']) - MonthBegin(1)

Edit: The above solution does not handle the dates which are already floored to the beginning of the month. Here is an alternative solution.

编辑:上述解决方案不处理已经下限到月初的日期。这是一个替代解决方案。

Here is a dataframe with additional test cases:

这是一个带有额外测试用例的数据框:

            value
date    
1986-01-31  22.93
1986-02-28  15.46
2018-01-01  20.00
2018-02-02  25.00

With timedelta method,

使用 timedelta 方法,

df.index = pd.to_datetime(df.index)
df.index = df.index - pd.to_timedelta(df.index.day - 1, unit='d')


            value
date    
1986-01-01  22.93
1986-02-01  15.46
2018-01-01  20.00
2018-02-01  25.00

回答by aldanor

Here's another 'pandonic' way to do it:

这是另一种“狂热”的方法:

df.date - pd.Timedelta('1 day') * (df.date.dt.day - 1)

回答by Grr

This will do the trick and no imports necessary. Numpy has a dtype datetime64which by default pandas sets to [ns]as seen by checking the dtype. You can change this to month, which will start on the first of the month by accessing the numpy array and changing the type.

这将解决问题,无需导入。Numpy 有一个 dtype datetime64,默认情况下,pandas[ns]通过检查 dtype设置为。您可以将其更改为月份,该月份将通过访问 numpy 数组并更改类型从该月的第一天开始。

df.date = pd.to_datetime(df.date.values.astype('datetime64[M]'))

It would be nice if pandas would implement this with their own astype()method but unfortunately you cannot.

如果Pandas能用自己的astype()方法实现这一点会很好,但不幸的是你不能。

The above works for data as datetime values or strings, if you already have your data as datetime[ns]type you can omit the pd.to_datetime()and just do:

以上适用于作为日期时间值或字符串的数据,如果您已经将数据作为datetime[ns]类型,则可以省略pd.to_datetime(),只需执行以下操作:

df.date = df.date.values.astype('datetime64[M]')

回答by W.Li

dt_1 = "2016-02-01"
def first_day(dt):
    lt_split = dt.split("-")
    return "-".join([lt_split[0], lt_split[1], "01"])

print first_day(dt_1)

For Panda's DataFrame, you can use dt["col_name_date"].apply(first_day).

对于 Panda 的 DataFrame,您可以使用dt["col_name_date"].apply(first_day).

回答by Mikhail Venkov

You can also use string datetime formating:

您还可以使用字符串日期时间格式:

df['month'] = df['date'].dt.strftime('%Y-%m-01')

df['month'] = df['date'].dt.strftime('%Y-%m-01')

回答by Yuca

From August 2019:

自 2019 年 8 月起:

This should work:

这应该有效:

[x.replace(day=1).date() for x in df['date']]

Only requirement is to make sure dateis a datetime, which we can guarantee with a call to pd.to_datetime(df['date'])

唯一的要求是确保date是一个日期时间,我们可以通过调用来保证pd.to_datetime(df['date'])