Python 熊猫:将日期时间转换为月末
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18233107/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas: convert datetime to end-of-month
提问by Anne
I have written a function to convert pandas datetime dates to month-end:
我编写了一个函数来将 Pandas 日期时间日期转换为月末:
import pandas
import numpy
import datetime
from pandas.tseries.offsets import Day, MonthEnd
def get_month_end(d):
month_end = d - Day() + MonthEnd()
if month_end.month == d.month:
return month_end # 31/March + MonthEnd() returns 30/April
else:
print "Something went wrong while converting dates to EOM: " + d + " was converted to " + month_end
raise
This function seems to be quite slow, and I was wondering if there is any faster alternative? The reason I noticed it's slow is that I am running this on a dataframe column with 50'000 dates, and I can see that the code is much slower since introducing that function (before I was converting dates to end-of-month).
这个功能似乎很慢,我想知道是否有更快的替代方案?我注意到它很慢的原因是我在一个具有 50'000 个日期的数据帧列上运行它,并且我可以看到自从引入该函数(在我将日期转换为月末之前)以来代码要慢得多。
df = pandas.read_csv(inpath, na_values = nas, converters = {open_date: read_as_date})
df[open_date] = df[open_date].apply(get_month_end)
I am not sure if that's relevant, but I am reading the dates in as follows:
我不确定这是否相关,但我正在阅读以下日期:
def read_as_date(x):
return datetime.datetime.strptime(x, fmt)
采纳答案by Jeff
Revised, converting to period and then back to timestamp does the trick
修改后,转换为句点,然后返回时间戳就可以了
In [104]: df = DataFrame(dict(date = [Timestamp('20130101'),Timestamp('20130131'),Timestamp('20130331'),Timestamp('20130330')],value=randn(4))).set_index('date')
In [105]: df
Out[105]:
value
date
2013-01-01 -0.346980
2013-01-31 1.954909
2013-03-31 -0.505037
2013-03-30 2.545073
In [106]: df.index = df.index.to_period('M').to_timestamp('M')
In [107]: df
Out[107]:
value
2013-01-31 -0.346980
2013-01-31 1.954909
2013-03-31 -0.505037
2013-03-31 2.545073
Note that this type of conversion can also be done like this, the above would be slightly faster, though.
请注意,这种类型的转换也可以这样完成,但上面的转换速度会稍微快一些。
In [85]: df.index + pd.offsets.MonthEnd(0)
Out[85]: DatetimeIndex(['2013-01-31', '2013-01-31', '2013-03-31', '2013-03-31'], dtype='datetime64[ns]', name=u'date', freq=None, tz=None)
回答by Piyush Jena
import pandas as pd
import numpy as np
import datetime as dt
df0['Calendar day'] = pd.to_datetime(df0['Calendar day'], format='%m/%d/%Y')
df0['Calendar day'] = df0['Calendar day'].apply(pd.datetools.normalize_date)
df0['Month Start Date'] = df0['Calendar day'].dt.to_period('M').apply(lambda r: r.start_time)
This code should work. Calendar Day is a column in which date is given in the format %m/%d/%Y. For example: 12/28/2014 is 28 December, 2014. The output comes out to be 2014-12-01 in class 'pandas.tslib.Timestamp' type.
这段代码应该可以工作。日历日是一列,其中日期以 %m/%d/%Y 格式给出。例如:12/28/2014 是 2014 年 12 月 28 日。在类 'pandas.tslib.Timestamp' 类型中输出为 2014-12-01。
回答by Matias Thayer
you can also use numpy to do it faster:
您还可以使用 numpy 更快地完成此操作:
import numpy as np
date_array = np.array(['2013-01-01', '2013-01-15', '2013-01-30']).astype('datetime64[ns]')
month_start_date = date_array.astype('datetime64[M]')
回答by Tony
In case the date is not in the index
but in another column (works for Pandas 0.25.0):
如果日期不在index
但在另一列中(适用于 Pandas 0.25.0):
import pandas as pd
import numpy as np
df = pd.DataFrame(dict(date = [pd.Timestamp('20130101'),
pd.Timestamp('20130201'),
pd.Timestamp('20130301'),
pd.Timestamp('20130401')],
value = np.random.rand(4)))
print(df.to_string())
df.date = df.date.dt.to_period('M').dt.to_timestamp('M')
print(df.to_string())
Output:
输出:
date value
0 2013-01-01 0.295791
1 2013-02-01 0.278883
2 2013-03-01 0.708943
3 2013-04-01 0.483467
date value
0 2013-01-31 0.295791
1 2013-02-28 0.278883
2 2013-03-31 0.708943
3 2013-04-30 0.483467