pandas 带字符串的熊猫“diff()”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40348541/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:19:32  来源:igfitidea点击:

Pandas "diff()" with string

pythonpandas

提问by guilhermecgs

How can I flag a row in a dataframe every time a column change its string value?

每次列更改其字符串值时,如何标记数据框中的行?

Ex:

前任:

Input

输入

ColumnA   ColumnB
1            Blue
2            Blue
3            Red
4            Red
5            Yellow


#  diff won't work here with strings....  only works in numerical values
dataframe['changed'] = dataframe['ColumnB'].diff()        


ColumnA   ColumnB      changed
1            Blue         0
2            Blue         0
3            Red          1
4            Red          0
5            Yellow       1

采纳答案by root

I get better performance with neinstead of using the actual !=comparison:

我通过ne而不是使用实际!=比较获得了更好的性能:

df['changed'] = df['ColumnB'].ne(df['ColumnB'].shift().bfill()).astype(int)

Timings

时间安排

Using the following setup to produce a larger dataframe:

使用以下设置生成更大的数据框:

df = pd.concat([df]*10**5, ignore_index=True) 

I get the following timings:

我得到以下时间:

%timeit df['ColumnB'].ne(df['ColumnB'].shift().bfill()).astype(int)
10 loops, best of 3: 38.1 ms per loop

%timeit (df.ColumnB != df.ColumnB.shift()).astype(int)
10 loops, best of 3: 77.7 ms per loop

%timeit df['ColumnB'] == df['ColumnB'].shift(1).fillna(df['ColumnB'])
10 loops, best of 3: 99.6 ms per loop

%timeit (df.ColumnB.ne(df.ColumnB.shift())).astype(int)
10 loops, best of 3: 19.3 ms per loop

回答by Kartik

Use .shiftand compare:

使用.shift和比较:

dataframe['changed'] = dataframe['ColumnB'] == dataframe['ColumnB'].shift(1).fillna(dataframe['ColumnB'])

回答by jezrael

For me works compare with shift, then NaNwas replaced 0because before no value:

对我来说,作品与 比较shift,然后NaN被替换,0因为之前没有价值:

df['diff'] = (df.ColumnB != df.ColumnB.shift()).astype(int)
df.ix[0,'diff'] = 0
print (df)
   ColumnA ColumnB  diff
0        1    Blue     0
1        2    Blue     0
2        3     Red     1
3        4     Red     0
4        5  Yellow     1

Edit by timingsof another answer - fastest is use ne:

按另一个答案的时间编辑- 最快的是使用ne

df['diff'] = (df.ColumnB.ne(df.ColumnB.shift())).astype(int)
df.ix[0,'diff'] = 0