pandas 在熊猫中将季度周期转换为日期时间的干净方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/53898482/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Clean way to convert quarterly periods to datetime in pandas
提问by Sander van den Oord
I'm looking for a nice, readable and understandable way (one that you can remember for the next time) to convert 'Q3 1996' to a pandas datetime, for example '1996-07-01' in this case. Until now I found this, but it's mighty ugly:
我正在寻找一种不错的、可读的和可理解的方式(您下次可以记住的方式)将 'Q3 1996' 转换为 Pandas 日期时间,例如在本例中为 '1996-07-01'。直到现在我发现了这个,但它非常丑陋:
df = pd.DataFrame({'Quarter':['Q3 1996', 'Q4 1996', 'Q1 1997']})
?
df['date'] = (
pd.to_datetime(
df['Quarter'].str.split(' ').apply(lambda x: ''.join(x[::-1]))
))
?
print(df)
Quarter date
0 Q3 1996 1996-07-01
1 Q4 1996 1996-10-01
2 Q1 1997 1997-01-01
I was hoping the following would work, because it's readable, but unfortunately it doesn't:
我希望以下内容可以工作,因为它是可读的,但不幸的是它没有:
df['date'] = pd.to_datetime(df['Quarter'], format='%q %Y')
The problem is also that quarter and year are apparently in the wrong order for pandas to do simple processing.
问题还在于,对于Pandas进行简单处理,季度和年份显然是错误的。
Can anyone help me find a cleaner way of converting 'Q3 1996' to a pandas datetime?
谁能帮我找到一种更简洁的方法将“Q3 1996”转换为Pandas日期时间?
回答by cs95
You can (and should) use pd.PeriodIndex
as a first step, then convert to timestamp using PeriodIndex.to_timestamp
:
您可以(并且应该)pd.PeriodIndex
用作第一步,然后使用PeriodIndex.to_timestamp
以下方法转换为时间戳:
qs = df['Quarter'].str.replace(r'(Q\d) (\d+)', r'-')
qs
0 1996-Q3
1 1996-Q4
2 1997-Q1
Name: Quarter, dtype: object
df['date'] = pd.PeriodIndex(qs, freq='Q').to_timestamp()
df
Quarter date
0 Q3 1996 1996-07-01
1 Q4 1996 1996-10-01
2 Q1 1997 1997-01-01
The initial replace step is necessary as PeriodIndex
expects your periods in the %Y-%q
format.
初始替换步骤是必要的,因为PeriodIndex
您期望%Y-%q
格式中的句点。
Another option is to use pd.to_datetime
after performing string replacement in the same way as before.
另一种选择是pd.to_datetime
在以与以前相同的方式执行字符串替换后使用。
df['date'] = pd.to_datetime(
df['Quarter'].str.replace(r'(Q\d) (\d+)', r'-'), errors='coerce')
df
Quarter date
0 Q3 1996 1996-07-01
1 Q4 1996 1996-10-01
2 Q1 1997 1997-01-01
If performance is important, you can split and join, but you can do it cleanly:
如果性能很重要,您可以拆分和加入,但您可以干净利落地进行:
df['date'] = pd.to_datetime([
'-'.join(x.split()[::-1]) for x in df['Quarter']])
df
Quarter date
0 Q3 1996 1996-07-01
1 Q4 1996 1996-10-01
2 Q1 1997 1997-01-01
回答by jezrael
Use slicing by last 4 values with first 2
and convert to datetimes:
使用第一个按最后 4 个值进行切片2
并转换为日期时间:
df['date'] = pd.to_datetime(df['Quarter'].str[-4:] + df['Quarter'].str[:2])
String operations in pandas are slow, so if no missing values is possible use list comprehension
:
pandas 中的字符串操作很慢,所以如果没有缺失值是可能的,请使用list comprehension
:
#python 3.6+
df['date'] = pd.to_datetime([f'{x[-4:]}{x[:2]}' for x in df['Quarter']])
#python bellow
#df['date'] = pd.to_datetime(['{}{}'.format(x[-4:], x[:2]) for x in df['Quarter']])
print (df)
Quarter date
0 Q3 1996 1996-07-01
1 Q4 1996 1996-10-01
2 Q1 1997 1997-01-01
回答by ifly6
Given a quarter format like 2018-Q1
, one can use the built in pd.to_datetime
function. As a general answer would have to deal with the plethora of ways one can store a quarter-year observation (e.g. 2018:1
, 2018:Q1
, 20181
, Q1:2018
, etc.), coercing the data into the format suprais outside of my answer's scope.
给定像 的四分之一格式2018-Q1
,可以使用内置pd.to_datetime
函数。作为一般的回答将不得不应对的方法之一可存储的四分之一年的观察(如多如牛毛2018:1
,2018:Q1
,20181
,Q1:2018
,等),强迫将数据导入格式超是我的回答的范围之外。
But given a formatted series:
但给定一个格式化的系列:
formatted_series = formatted_series_supplier() ...
df['date'] = pd.to_datetime(formatted_series)
And if you're dealing with regulatory data, which almost always reflects the end of the quarter rather than it's start (i.e. instead of 2019-01-01, you want 2019-03-31), you can use offsets like below:
如果您处理的监管数据几乎总是反映季度末而不是季度开始(即您想要 2019-01-01,而不是 2019-03-31),您可以使用如下抵消:
df['date'] = df['date'] + pd.offsets.QuarterEnd(0)