pandas 在python数据框中将月份转换为季度

Question

提问by Pranav Kansara

I have a column in my data frame denoting month (in the form yyyy-mm). I want to convert that to quarter using pd.Period. I tried using apply function in below form but it's running too slow. Is there a better way to do this? I am using :

我的数据框中有一列表示月份（形式为yyyy-mm）。我想使用pd.Period. 我尝试在下面的表单中使用 apply 函数，但它运行速度太慢。有一个更好的方法吗？我在用：

hp2['Qtr'] = hp2.apply(lambda x: pd.Period(x['Mth'],'Q'),axis=1)

Answer 1

回答by MaxU

I would use to_datetime()method in a "vectorized" manner:

我会以“矢量化”的方式使用to_datetime()方法：

In [76]: x
Out[76]:
     Month
0  2016-11
1  2011-01
2  2015-07
3  2012-09

In [77]: x['Qtr'] = pd.to_datetime(x.Month).dt.quarter

In [78]: x
Out[78]:
     Month  Qtr
0  2016-11    4
1  2011-01    1
2  2015-07    3
3  2012-09    3

Or if you want to have it in 2016Q4format (as @root mentioned), using PeriodIndex():

或者，如果您想将其2016Q4格式化（如@root 所述），请使用PeriodIndex()：

In [114]: x['Qtr'] = pd.PeriodIndex(pd.to_datetime(x.Mth), freq='Q')

In [115]: x
Out[115]:
       Mth    Qtr
0  2016-11 2016Q4
1  2011-01 2011Q1
2  2015-07 2015Q3
3  2012-09 2012Q3

Answer 2

回答by neuromusic

Since you don't need the whole row, is it faster if you map the values from the column alone?

由于您不需要整行，如果您单独映射列中的值会更快吗？

hp2['Qtr'] = hp2['Mth'].map(lambda x: pd.Period(x,'Q'))

Answer 3

回答by J. Khoury

I happen to be working on a df that contains 9994 rows so I tested your code against what I've used in the past and posted the results for you. Here is a sample of the df, not exactly YYYY-MM but it doesn't matter because the code will work on either:

我碰巧正在处理一个包含 9994 行的 df，所以我根据我过去使用的内容测试了您的代码，并为您发布了结果。这是 df 的一个示例，不完全是 YYYY-MM，但这并不重要，因为代码可以在任何一个上工作：

hp2['Mth'][:10]
Out[11]: 
0   2016-06-26
1   2016-06-26
2   2016-06-26
3   2016-06-26
4   2016-06-26
5   2016-06-26
6   2016-06-26
7   2016-06-26
8   2016-06-26
9   2016-06-26
Name: Mth, dtype: datetime64[ns]

I ran your code on my df:

我在我的 df 上运行了你的代码：

%timeit hp2['Qtr_Period']= hp2.apply(lambda x: pd.Period(x['Mth'],'Q'), axis=1)
hp2['Qtr_Period'][:10]
1 loop, best of 3: 2.28 s per loop
Out[13]: 
0   2016Q2
1   2016Q2
2   2016Q2
3   2016Q2
4   2016Q2
5   2016Q2
6   2016Q2
7   2016Q2
8   2016Q2
9   2016Q2
Name: Qtr_Period, dtype: object

Then I tested it using this:

然后我用这个测试了它：

%timeit hp2['Qtr_dt']= (df['Order Date'].dt.year.astype(str))+'Q'+(df['Order Date'].dt.quarter.astype(str))
hp2['Qtr_dt'][:10]
10 loops, best of 3: 67.6 ms per loop
Out[14]: 
0    2016Q2
1    2016Q2
2    2016Q2
3    2016Q2
4    2016Q2
5    2016Q2
6    2016Q2
7    2016Q2
8    2016Q2
9    2016Q2
Name: Qtr_dt, dtype: object

It is clear from the results. Hope that helps. You can find more information on pandas.Series.dt

从结果可以看出。希望有帮助。您可以在pandas.Series.dt上找到更多信息

Answer 4

回答by Moses Njenga

month = ['2016-11', '2011-01', '2015-06', '2012-09']
x = pd.DataFrame(month, columns=["month"])
x.month = pd.to_datetime(x.month)
x['quarter'] = [pd.Period(x.month[i], freq='M').quarter for i in range(len(x))]
x

    month     quarter
0   2016-11-01  4
1   2011-01-01  1
2   2015-06-01  2
3   2012-09-01  3

Answer 5

回答by root

Same idea as @MaxU but using astype:

与@MaxU 相同的想法，但使用astype：

hp2['Qtr'] = pd.to_datetime(hp2['Mth'].values, format='%Y-%m').astype('period[Q]')

The resulting output:

结果输出：

        Mth    Qtr
0   2014-01 2014Q1
1   2017-02 2017Q1
2   2016-03 2016Q1
3   2017-04 2017Q2
4   2016-05 2016Q2
5   2016-06 2016Q2
6   2017-07 2017Q3
7   2016-08 2016Q3
8   2017-09 2017Q3
9   2015-10 2015Q4
10  2017-11 2017Q4
11  2015-12 2015Q4

Timings

时间安排

Using the following setup to produce a large sample dataset:

使用以下设置生成大型样本数据集：

n = 10**5
yrs = np.random.choice(range(2010, 2021), n)
mths = np.random.choice(range(1, 13), n)
df = pd.DataFrame({'Mth': ['{0}-{1:02d}'.format(*p) for p in zip(yrs, mths)]})

I get the following timings:

我得到以下时间：

%timeit pd.to_datetime(df['Mth'].values, format='%Y-%m').astype('period[Q]')
10 loops, best of 3: 33.4 ms per loop

%timeit pd.PeriodIndex(pd.to_datetime(df.Mth), freq='Q')
1 loop, best of 3: 2.68 s per loop

%timeit df['Mth'].map(lambda x: pd.Period(x,'Q'))
1 loop, best of 3: 6.26 s per loop

%timeit df.apply(lambda x: pd.Period(x['Mth'],'Q'),axis=1)
1 loop, best of 3: 9.49 s per loop

pandas 在python数据框中将月份转换为季度

提问by Pranav Kansara

回答by MaxU

回答by neuromusic

回答by J. Khoury

回答by Moses Njenga

回答by root

相关推荐

最近更新

标签

pandas 在python数据框中将月份转换为季度

提问by Pranav Kansara

回答by MaxU

回答by neuromusic

回答by J. Khoury

回答by Moses Njenga

回答by root

相关推荐

pandas.read_sql 的 UnicodeDecodeError

如何在 Pandas 中创建多索引

如何在 Pandas Dataframe 中增量添加行？

pandas 使用 Python 计算 OHLC 数据的平均真实范围 (ATR)

相关推荐

最近更新

标签