python:pandas np.where 与 df.loc 具有多种条件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44569879/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:47:37  来源:igfitidea点击:

python: pandas np.where vs. df.loc with multiple conditions

pythonpandasnumpytypeerror

提问by jeangelj

Np.where has been giving me a lot of errors, so I am looking for a solution with df.loc instead.

Np.where 给了我很多错误,所以我正在寻找 df.loc 的解决方案。

This is the np.where error I have been getting:

这是我遇到的 np.where 错误:

C:\Users\xxx\AppData\Local\Continuum\Anaconda2\lib\site-packages\ipykernel\__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

I am working with the following dataframe df:

我正在使用以下数据框 df:

df = pd.DataFrame({'Column_A': ['AAA','AAA','ABC','CDE'],'checked': ['0','0','1','0'],'duplicate': ['True','True','False','False']})

    Column_A    checked   duplicate
0   AAA             0      True
1   AAA             0      True
2   ABC             1      False
3   CDE             0      False

I want to create an additional flag, if checked is 0 and duplicate is True.

我想创建一个额外的标志,如果检查为 0 并且重复为 True。

I tried this and it didn't work:

我试过这个,但没有用:

df['flag'] = (np.where((df['checked'] == 'Y') &(df['duplicate'] == 'True'), 'Y', '0'))

TypeError: invalid type comparison

I tried it with df.loc:

我用 df.loc 试了一下:

df['flag'] = (df.loc[df['checked'] == 'Y']& df.loc[df['duplicate'] == 'True'], 'Y','0')

TypeError: invalid type comparison

and I get the same error!

我得到了同样的错误!

回答by jezrael

I think your booleanare not strings, so need remove ':

我认为你boolean不是strings,所以需要删除'

df = pd.DataFrame({'Column_A': ['AAA','AAA','ABC','CDE'],
                  'checked': ['0','0','1','0'],
                  'duplicate': [True, True, False, False]})

df['flag'] = np.where((df['checked'] == 'Y') &(df['duplicate'] == True), 'Y', '0')
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    0
1      AAA       0       True    0
2      ABC       1      False    0
3      CDE       0      False    0

Or if compare with booleancolumn, == Truecan be omited:

或者如果与boolean列比较,== True可以省略:

df['flag'] = np.where((df['checked'] == 'Y') &(df['duplicate']), 'Y', '0')
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    0
1      AAA       0       True    0
2      ABC       1      False    0
3      CDE       0      False    0

Also if need check checkedneed 'because strings:

此外,如果需要检查checked需要,'因为strings

df['flag'] = np.where((df['checked'] == '0') &(df['duplicate'] == True), 'Y', '0')
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    Y
1      AAA       0       True    Y
2      ABC       1      False    0
3      CDE       0      False    0

EDIT:

编辑:

Solution with loc:

解决方案loc

df['flag'] = '0'
mask = (df['checked'] == '0') &(df['duplicate'])
df.loc[mask, 'flag'] = 'Y'
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    Y
1      AAA       0       True    Y
2      ABC       1      False    0
3      CDE       0      False    0