pandas 在python数据框中将月份转换为季度

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40368677/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:19:56  来源:igfitidea点击:

Converting month to quarter in python dataframe

pythonpandas

提问by Pranav Kansara

I have a column in my data frame denoting month (in the form yyyy-mm). I want to convert that to quarter using pd.Period. I tried using apply function in below form but it's running too slow. Is there a better way to do this? I am using :

我的数据框中有一列表示月份(形式为yyyy-mm)。我想使用pd.Period. 我尝试在下面的表单中使用 apply 函数,但它运行速度太慢。有一个更好的方法吗?我在用 :

hp2['Qtr'] = hp2.apply(lambda x: pd.Period(x['Mth'],'Q'),axis=1)

回答by MaxU

I would use to_datetime()method in a "vectorized" manner:

我会以“矢量化”的方式使用to_datetime()方法:

In [76]: x
Out[76]:
     Month
0  2016-11
1  2011-01
2  2015-07
3  2012-09

In [77]: x['Qtr'] = pd.to_datetime(x.Month).dt.quarter

In [78]: x
Out[78]:
     Month  Qtr
0  2016-11    4
1  2011-01    1
2  2015-07    3
3  2012-09    3

Or if you want to have it in 2016Q4format (as @root mentioned), using PeriodIndex():

或者,如果您想将其2016Q4格式化(如@root 所述),请使用PeriodIndex()

In [114]: x['Qtr'] = pd.PeriodIndex(pd.to_datetime(x.Mth), freq='Q')

In [115]: x
Out[115]:
       Mth    Qtr
0  2016-11 2016Q4
1  2011-01 2011Q1
2  2015-07 2015Q3
3  2012-09 2012Q3

回答by neuromusic

Since you don't need the whole row, is it faster if you map the values from the column alone?

由于您不需要整行,如果您单独映射列中的值会更快吗?

hp2['Qtr'] = hp2['Mth'].map(lambda x: pd.Period(x,'Q'))

回答by J. Khoury

I happen to be working on a df that contains 9994 rows so I tested your code against what I've used in the past and posted the results for you. Here is a sample of the df, not exactly YYYY-MM but it doesn't matter because the code will work on either:

我碰巧正在处理一个包含 9994 行的 df,所以我根据我过去使用的内容测试了您的代码,并为您发布了结果。这是 df 的一个示例,不完全是 YYYY-MM,但这并不重要,因为代码可以在任何一个上工作:

hp2['Mth'][:10]
Out[11]: 
0   2016-06-26
1   2016-06-26
2   2016-06-26
3   2016-06-26
4   2016-06-26
5   2016-06-26
6   2016-06-26
7   2016-06-26
8   2016-06-26
9   2016-06-26
Name: Mth, dtype: datetime64[ns]

I ran your code on my df:

我在我的 df 上运行了你的代码:

%timeit hp2['Qtr_Period']= hp2.apply(lambda x: pd.Period(x['Mth'],'Q'), axis=1)
hp2['Qtr_Period'][:10]
1 loop, best of 3: 2.28 s per loop
Out[13]: 
0   2016Q2
1   2016Q2
2   2016Q2
3   2016Q2
4   2016Q2
5   2016Q2
6   2016Q2
7   2016Q2
8   2016Q2
9   2016Q2
Name: Qtr_Period, dtype: object

Then I tested it using this:

然后我用这个测试了它:

%timeit hp2['Qtr_dt']= (df['Order Date'].dt.year.astype(str))+'Q'+(df['Order Date'].dt.quarter.astype(str))
hp2['Qtr_dt'][:10]
10 loops, best of 3: 67.6 ms per loop
Out[14]: 
0    2016Q2
1    2016Q2
2    2016Q2
3    2016Q2
4    2016Q2
5    2016Q2
6    2016Q2
7    2016Q2
8    2016Q2
9    2016Q2
Name: Qtr_dt, dtype: object

It is clear from the results. Hope that helps. You can find more information on pandas.Series.dt

从结果可以看出。希望有帮助。您可以在pandas.Series.dt上找到更多信息

回答by Moses Njenga

month = ['2016-11', '2011-01', '2015-06', '2012-09']
x = pd.DataFrame(month, columns=["month"])
x.month = pd.to_datetime(x.month)
x['quarter'] = [pd.Period(x.month[i], freq='M').quarter for i in range(len(x))]
x

    month     quarter
0   2016-11-01  4
1   2011-01-01  1
2   2015-06-01  2
3   2012-09-01  3

回答by root

Same idea as @MaxU but using astype:

与@MaxU 相同的想法,但使用astype

hp2['Qtr'] = pd.to_datetime(hp2['Mth'].values, format='%Y-%m').astype('period[Q]')

The resulting output:

结果输出:

        Mth    Qtr
0   2014-01 2014Q1
1   2017-02 2017Q1
2   2016-03 2016Q1
3   2017-04 2017Q2
4   2016-05 2016Q2
5   2016-06 2016Q2
6   2017-07 2017Q3
7   2016-08 2016Q3
8   2017-09 2017Q3
9   2015-10 2015Q4
10  2017-11 2017Q4
11  2015-12 2015Q4

Timings

时间安排

Using the following setup to produce a large sample dataset:

使用以下设置生成大型样本数据集:

n = 10**5
yrs = np.random.choice(range(2010, 2021), n)
mths = np.random.choice(range(1, 13), n)
df = pd.DataFrame({'Mth': ['{0}-{1:02d}'.format(*p) for p in zip(yrs, mths)]})

I get the following timings:

我得到以下时间:

%timeit pd.to_datetime(df['Mth'].values, format='%Y-%m').astype('period[Q]')
10 loops, best of 3: 33.4 ms per loop

%timeit pd.PeriodIndex(pd.to_datetime(df.Mth), freq='Q')
1 loop, best of 3: 2.68 s per loop

%timeit df['Mth'].map(lambda x: pd.Period(x,'Q'))
1 loop, best of 3: 6.26 s per loop

%timeit df.apply(lambda x: pd.Period(x['Mth'],'Q'),axis=1)
1 loop, best of 3: 9.49 s per loop