Pandas 在列中查找序列或模式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42555031/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:06:44  来源:igfitidea点击:

Pandas Find Sequence or Pattern in Column

pythonpandasdataframesequences

提问by Python_Learner_DK

Here's some example data for the problem I'm working on:

以下是我正在处理的问题的一些示例数据:

index     Quarter    Sales_Growth
0          2001q1    0
1          2002q2    0
2          2002q3    1
3          2002q4    0
4          2003q1    0
5          2004q2    0
6          2004q3    1
7          2004q4    1

The Sales_Growthcolumn tells me if there was indeed sales growth in the quarter or not. 0 = no growth, 1 = growth.

Sales_Growth专栏告诉我该季度是否确实有销售增长。0 = 无增长,1 = 增长。

First, I'm trying to return the first Quarterwhen there were two consecutive quarters of no sales growth.

首先,Quarter当连续两个季度没有销售增长时,我试图返回第一个。

With the data above this answer would be 2001q1.

有了上面的数据,这个答案就是2001q1.

Then, I want to return the 2nd quarter of consecutive sales growth that occurs AFTER the initial two quarters of no growth.

然后,我想返回在最初两个没有增长的季度之后发生的连续第二个季度的销售增长。

The answer to this question would be 2004q4.

这个问题的答案是2004q4

I've searched and searched but the closest answer I can find I can't get to work: https://stackoverflow.com/a/26539166/3225420

我已经搜索并搜索过但我能找到的最接近的答案我无法上班:https: //stackoverflow.com/a/26539166/3225420

Thanks in advance for helping a Pandas newbie, I'm hacking away as best I can but stuck on this one.

在此先感谢您帮助 Pandas 新手,我正在尽我所能,但坚持这个。

回答by John Zwinck

You're doing subsequence matching. This is a bit strange, but bear with me:

你在做子序列匹配。这有点奇怪,但请耐心等待:

growth = df.Sales_Growth.astype(str).str.cat()

That gives you:

这给了你:

'00100011'

Then:

然后:

growth.index('0011')

Gives you 4 (obviously you'd add a constant 3 to get the index of the last row matched by the pattern).

给你 4(显然你会添加一个常量 3 来获取与模式匹配的最后一行的索引)。

I feel this approach starts off a bit ugly, but the end result is really usable--you can search for any fixed pattern with no additional coding.

我觉得这种方法开始有点难看,但最终结果确实有用——您可以搜索任何固定模式而无需额外编码。

回答by languitar

For Q1:

对于第一季度:

temp = df.Sales_Growth + df.Sales_Growth.shift(-1)
df[temp == 0].head(1)

For Q2:

对于 Q2:

df[(df.Sales_Growth == 1) & (df.Sales_Growth.shift(1) == 1) & (df.Sales_Growth.shift(2) == 0) & (df.Sales_Growth.shift(3) == 0)].head(1)

回答by Bill G

Building on the earlier answers. Q1:

建立在先前的答案的基础上。问题 1:

temp = df.Sales_Growth.rolling_apply(window=2, min_periods=2, \
    kwargs={pattern: [0,0]}, func=lambda x, pattern: x == pattern)
print(df[temp==1].head())

In the rolling_apply call, windowand min_periodsmust match the length of the pattern list being passed to the rolling_apply function.

在rolling_apply 调用中,window并且min_periods必须匹配传递给rolling_apply 函数的模式列表的长度。

Q2: Same approach, different pattern:

Q2:相同的方法,不同的模式:

temp = df.Sales_Growth.rolling_apply(window=4, min_periods=4, \
    kwargs={pattern: [0,0,1,1]}, func=lambda x, pattern: x == pattern)
print(df[temp==1].head())