Python 熊猫：有条件的滚动计数

Question

提问by justinlevol

I have a Series that looks the following:

我有一个看起来如下的系列：

   col
0  B
1  B
2  A
3  A
4  A
5  B

It's a time series, therefore the index is ordered by time.

这是一个时间序列，因此索引按时间排序。

For each row, I'd like to count how many times the value has appeared consecutively, i.e.:

对于每一行，我想计算该值连续出现的次数，即：

Output:

输出：

   col count
0  B   1
1  B   2
2  A   1 # Value does not match previous row => reset counter to 1
3  A   2
4  A   3
5  B   1 # Value does not match previous row => reset counter to 1

I found 2 related questions, but I can't figure out how to "write" that information as a new column in the DataFrame, for each row (as above). Using rolling_apply does not work well.

我发现了 2 个相关问题，但我无法弄清楚如何将该信息“写入”为 DataFrame 中的每一行的新列（如上）。使用rolling_apply 效果不佳。

Counting consecutive events on pandas dataframe by their index

按索引计算熊猫数据帧上的连续事件

Finding consecutive segments in a pandas data frame

在 Pandas 数据框中查找连续段

Answer 1

回答by chrisb

Based on the second answer you linked, assuming sis your series.

根据您链接的第二个答案，假设s是您的系列。

df = pd.DataFrame(s)
df['block'] = (df['col'] != df['col'].shift(1)).astype(int).cumsum()
df['count'] = df.groupby('block').transform(lambda x: range(1, len(x) + 1))


In [88]: df
Out[88]: 
  col  block  count
0   B      1      1
1   B      1      2
2   A      2      1
3   A      2      2
4   A      2      3
5   B      3      1

Answer 2

回答by ZJS

I like the answer by @chrisb but wanted to share my own solution, since some people might find it more readable and easier to use with similar problems....

我喜欢@chrisb 的答案，但想分享我自己的解决方案，因为有些人可能会发现它在处理类似问题时更具可读性且更易于使用......

1) Create a function that uses static variables

1）创建一个使用静态变量的函数

def rolling_count(val):
    if val == rolling_count.previous:
        rolling_count.count +=1
    else:
        rolling_count.previous = val
        rolling_count.count = 1
    return rolling_count.count
rolling_count.count = 0 #static variable
rolling_count.previous = None #static variable

2) apply it to your Series after converting to dataframe

2）转换为数据框后将其应用于您的系列

df  = pd.DataFrame(s)
df['count'] = df['col'].apply(rolling_count) #new column in dataframe

output of df

df 的输出

  col  count
0   B      1
1   B      2
2   A      1
3   A      2
4   A      3
5   B      1

Answer 3

回答by CodeShaman

One-liner:

单线：

df['count'] = df.groupby('col').cumcount()

or

或者

df['count'] = df.groupby('col').cumcount() + 1

if you want the counts to begin at 1.

如果您希望计数从 1 开始。

Answer 4

回答by P.Tillmann

I think there is a nice way to combine the solution of @chrisb and @CodeShaman (As it was pointed out CodeShamans solution counts total and not consecutive values).

我认为有一种很好的方法可以将 @chrisb 和 @CodeShaman 的解决方案结合起来（正如有人指出的 CodeShamans 解决方案计算总数而不是连续值）。

  df['count'] = df.groupby((df['col'] != df['col'].shift(1)).cumsum()).cumcount()+1

  col  count
0   B      1
1   B      2
2   A      1
3   A      2
4   A      3
5   B      1

Answer 5

回答by Benjamin Breton

If you wish to do the same thing but filter on two columns, you can use this.

如果您希望做同样的事情但过滤两列，您可以使用它。

def count_consecutive_items_n_cols(df, col_name_list, output_col):
    cum_sum_list = [
        (df[col_name] != df[col_name].shift(1)).cumsum().tolist() for col_name in col_name_list
    ]
    df[output_col] = df.groupby(
        ["_".join(map(str, x)) for x in zip(*cum_sum_list)]
    ).cumcount() + 1
    return df

col_a col_b count
0   1     B     1
1   1     B     2
2   1     A     1
3   2     A     1
4   2     A     2
5   2     B     1

Python 熊猫：有条件的滚动计数

提问by justinlevol

回答by chrisb

回答by ZJS

回答by CodeShaman

回答by P.Tillmann

回答by Benjamin Breton

相关推荐

最近更新

标签

Python 熊猫：有条件的滚动计数

提问by justinlevol

回答by chrisb

回答by ZJS

回答by CodeShaman

回答by P.Tillmann

回答by Benjamin Breton

相关推荐

Python 返回，返回无，根本不返回？

python请求获取cookie

Python Pygame 图像位置

Python argparse：默认值或指定值

相关推荐

最近更新

标签