pandas 熊猫表子集给出无效的类型比较错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39998850/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas table subsets giving invalid type comparison error
提问by dreab
I am using pandas and want to select subsets of data and apply it to other columns. e.g.
我正在使用Pandas并希望选择数据子集并将其应用于其他列。例如
- if there is data in column A; &
- if there is NO data in column B;
- then, apply the data in column A to column D
- 如果A列有数据;&
- 如果B列没有数据;
- 然后,将 A 列中的数据应用于 D 列
I have this working fine for now using .isnull()
and .notnull()
.
e.g.
我现在使用.isnull()
and可以正常工作.notnull()
。例如
df = pd.DataFrame({'A' : pd.Series(np.random.randn(4)),
'B' : pd.Series(np.nan),
'C' : pd.Series(['yes','yes','no','maybe'])})
df['D']=''
df
Out[44]:
A B C D
0 0.516752 NaN yes
1 -0.513194 NaN yes
2 0.861617 NaN no
3 -0.026287 NaN maybe
# Now try the first conditional expression
df['D'][df['A'].notnull() & df['B'].isnull()] \
= df['A'][df['A'].notnull() & df['B'].isnull()]
df
Out[46]:
A B C D
0 0.516752 NaN yes 0.516752
1 -0.513194 NaN yes -0.513194
2 0.861617 NaN no 0.861617
3 -0.026287 NaN maybe -0.0262874
When one adds a third condition, to also check whether data in column C matches a particular string, we get the error:
当添加第三个条件时,还要检查 C 列中的数据是否与特定字符串匹配,我们得到错误:
df['D'][df['A'].notnull() & df['B'].isnull() & df['C']=='yes'] \
= df['A'][df['A'].notnull() & df['B'].isnull() & df['C']=='yes']
File "C:\Anaconda2\Lib\site-packages\pandas\core\ops.py", line 763, in wrapper
res = na_op(values, other)
File "C:\Anaconda2\Lib\site-packages\pandas\core\ops.py", line 718, in na_op
raise TypeError("invalid type comparison")
TypeError: invalid type comparison
I have read that this occurs due to the different datatypes. And I can get it working if I change all the strings in column C for integers or booleans. We also know that string on its own would work, e.g. df['A'][df['B']=='yes']
gives a boolean list.
我已经读到这是由于不同的数据类型而发生的。如果我将 C 列中的所有字符串更改为整数或布尔值,我就可以让它工作。我们也知道字符串本身可以工作,例如df['A'][df['B']=='yes']
给出一个布尔列表。
So any ideas how/why this is not working when combining these datatypes in this conditional expression? What are the more pythonic ways to do what appears to be quite long-winded?
那么在这个条件表达式中组合这些数据类型时,任何想法如何/为什么这不起作用?什么是更 Pythonic 的方法来做看起来很啰嗦的事情?
Thanks
谢谢
采纳答案by jezrael
I think you need add parentheses ()
to conditions, also better is use ix
for selecting column with boolean mask which can be assigned to variable mask
:
我认为您需要()
在条件中添加括号,最好ix
用于选择带有布尔掩码的列,该列可以分配给变量mask
:
mask = (df['A'].notnull()) & (df['B'].isnull()) & (df['C']=='yes')
print (mask)
0 True
1 True
2 False
3 False
dtype: bool
df.ix[mask, 'D'] = df.ix[mask, 'A']
print (df)
A B C D
0 -0.681771 NaN yes -0.681771
1 -0.871787 NaN yes -0.871787
2 -0.805301 NaN no
3 1.264103 NaN maybe
回答by Luke Olson
In case this solution doesn't work for anyone, another situation that happened to me was that even though I was reading all data in as dtype=str
(and therefore doing any string comparison should be OK [ie df[col] == "some string"
]), I had a column of all nulls, which becomes type float
, which will give an error when comparing to a string.
万一这个解决方案对任何人都不起作用,发生在我身上的另一种情况是,即使我正在读取所有数据dtype=str
(因此进行任何字符串比较应该没问题 [ie df[col] == "some string"
]),我有一列所有空值, 变成 type float
,与字符串比较时会出错。
To get around that, you can use .astype(str)
to ensure a string to string comparison will be performed.
为了解决这个问题,您可以使用.astype(str)
来确保将执行字符串到字符串的比较。