pandas 在熊猫中将季度周期转换为日期时间的干净方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/53898482/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:13:37  来源:igfitidea点击:

Clean way to convert quarterly periods to datetime in pandas

pythonpandasdatedatetimeperiod

提问by Sander van den Oord

I'm looking for a nice, readable and understandable way (one that you can remember for the next time) to convert 'Q3 1996' to a pandas datetime, for example '1996-07-01' in this case. Until now I found this, but it's mighty ugly:

我正在寻找一种不错的、可读的和可理解的方式(您下次可以记住的方式)将 'Q3 1996' 转换为 Pandas 日期时间,例如在本例中为 '1996-07-01'。直到现在我发现了这个,但它非常丑陋:

df = pd.DataFrame({'Quarter':['Q3 1996', 'Q4 1996', 'Q1 1997']})
?
df['date'] = (
    pd.to_datetime(
        df['Quarter'].str.split(' ').apply(lambda x: ''.join(x[::-1]))
))
?
print(df)
   Quarter       date
0  Q3 1996 1996-07-01
1  Q4 1996 1996-10-01
2  Q1 1997 1997-01-01

I was hoping the following would work, because it's readable, but unfortunately it doesn't:

我希望以下内容可以工作,因为它是可读的,但不幸的是它没有:

df['date'] = pd.to_datetime(df['Quarter'], format='%q %Y')

The problem is also that quarter and year are apparently in the wrong order for pandas to do simple processing.

问题还在于,对于Pandas进行简单处理,季度和年份显然是错误的。

Can anyone help me find a cleaner way of converting 'Q3 1996' to a pandas datetime?

谁能帮我找到一种更简洁的方法将“Q3 1996”转换为Pandas日期时间?

回答by cs95

You can (and should) use pd.PeriodIndexas a first step, then convert to timestamp using PeriodIndex.to_timestamp:

您可以(并且应该)pd.PeriodIndex用作第一步,然后使用PeriodIndex.to_timestamp以下方法转换为时间戳:

qs = df['Quarter'].str.replace(r'(Q\d) (\d+)', r'-')
qs

0    1996-Q3
1    1996-Q4
2    1997-Q1
Name: Quarter, dtype: object

df['date'] = pd.PeriodIndex(qs, freq='Q').to_timestamp()
df

   Quarter       date
0  Q3 1996 1996-07-01
1  Q4 1996 1996-10-01
2  Q1 1997 1997-01-01

The initial replace step is necessary as PeriodIndexexpects your periods in the %Y-%qformat.

初始替换步骤是必要的,因为PeriodIndex您期望%Y-%q格式中的句点。



Another option is to use pd.to_datetimeafter performing string replacement in the same way as before.

另一种选择是pd.to_datetime在以与以前相同的方式执行字符串替换后使用。

df['date'] = pd.to_datetime(
    df['Quarter'].str.replace(r'(Q\d) (\d+)', r'-'), errors='coerce')
df

   Quarter       date
0  Q3 1996 1996-07-01
1  Q4 1996 1996-10-01
2  Q1 1997 1997-01-01


If performance is important, you can split and join, but you can do it cleanly:

如果性能很重要,您可以拆分和加入,但您可以干净利落地进行:

df['date'] = pd.to_datetime([
    '-'.join(x.split()[::-1]) for x in df['Quarter']])

df

   Quarter       date
0  Q3 1996 1996-07-01
1  Q4 1996 1996-10-01
2  Q1 1997 1997-01-01

回答by jezrael

Use slicing by last 4 values with first 2and convert to datetimes:

使用第一个按最后 4 个值进行切片2并转换为日期时间:

df['date'] = pd.to_datetime(df['Quarter'].str[-4:] + df['Quarter'].str[:2])

String operations in pandas are slow, so if no missing values is possible use list comprehension:

pandas 中的字符串操作很慢,所以如果没有缺失值是可能的,请使用list comprehension

#python 3.6+ 
df['date'] = pd.to_datetime([f'{x[-4:]}{x[:2]}' for x in df['Quarter']])
#python bellow
#df['date'] = pd.to_datetime(['{}{}'.format(x[-4:], x[:2]) for x in df['Quarter']])
print (df)
   Quarter       date
0  Q3 1996 1996-07-01
1  Q4 1996 1996-10-01
2  Q1 1997 1997-01-01

回答by ifly6

Given a quarter format like 2018-Q1, one can use the built in pd.to_datetimefunction. As a general answer would have to deal with the plethora of ways one can store a quarter-year observation (e.g. 2018:1, 2018:Q1, 20181, Q1:2018, etc.), coercing the data into the format suprais outside of my answer's scope.

给定像 的四分之一格式2018-Q1,可以使用内置pd.to_datetime函数。作为一般的回答将不得不应对的方法之一可存储的四分之一年的观察(如多如牛毛2018:12018:Q120181Q1:2018,等),强迫将数据导入格式是我的回答的范围之外。

But given a formatted series:

但给定一个格式化的系列:

formatted_series = formatted_series_supplier() ...
df['date'] = pd.to_datetime(formatted_series)

And if you're dealing with regulatory data, which almost always reflects the end of the quarter rather than it's start (i.e. instead of 2019-01-01, you want 2019-03-31), you can use offsets like below:

如果您处理的监管数据几乎总是反映季度末而不是季度开始(即您想要 2019-01-01,而不是 2019-03-31),您可以使用如下抵消:

df['date'] = df['date'] + pd.offsets.QuarterEnd(0)