pandas 熊猫表子集给出无效的类型比较错误

Question

提问by dreab

I am using pandas and want to select subsets of data and apply it to other columns. e.g.

我正在使用Pandas并希望选择数据子集并将其应用于其他列。例如

if there is data in column A; &
if there is NO data in column B;
then, apply the data in column A to column D

如果A列有数据；&
如果B列没有数据；
然后，将 A 列中的数据应用于 D 列

I have this working fine for now using .isnull()and .notnull(). e.g.

我现在使用.isnull()and可以正常工作.notnull()。例如

df = pd.DataFrame({'A' : pd.Series(np.random.randn(4)),
                       'B' : pd.Series(np.nan),
                       'C' : pd.Series(['yes','yes','no','maybe'])})
df['D']=''

df
Out[44]: 
          A   B      C D
0  0.516752 NaN    yes  
1 -0.513194 NaN    yes  
2  0.861617 NaN     no  
3 -0.026287 NaN  maybe  

# Now try the first conditional expression
df['D'][df['A'].notnull() & df['B'].isnull()] \
=  df['A'][df['A'].notnull() & df['B'].isnull()]   
df
Out[46]: 
          A   B      C          D
0  0.516752 NaN    yes   0.516752
1 -0.513194 NaN    yes  -0.513194
2  0.861617 NaN     no   0.861617
3 -0.026287 NaN  maybe -0.0262874

When one adds a third condition, to also check whether data in column C matches a particular string, we get the error:

当添加第三个条件时，还要检查 C 列中的数据是否与特定字符串匹配，我们得到错误：

df['D'][df['A'].notnull() & df['B'].isnull() & df['C']=='yes'] \
=  df['A'][df['A'].notnull() & df['B'].isnull() & df['C']=='yes']   


  File "C:\Anaconda2\Lib\site-packages\pandas\core\ops.py", line 763, in wrapper
    res = na_op(values, other)

  File "C:\Anaconda2\Lib\site-packages\pandas\core\ops.py", line 718, in na_op
    raise TypeError("invalid type comparison")

TypeError: invalid type comparison

I have read that this occurs due to the different datatypes. And I can get it working if I change all the strings in column C for integers or booleans. We also know that string on its own would work, e.g. df['A'][df['B']=='yes']gives a boolean list.

我已经读到这是由于不同的数据类型而发生的。如果我将 C 列中的所有字符串更改为整数或布尔值，我就可以让它工作。我们也知道字符串本身可以工作，例如df['A'][df['B']=='yes']给出一个布尔列表。

So any ideas how/why this is not working when combining these datatypes in this conditional expression? What are the more pythonic ways to do what appears to be quite long-winded?

那么在这个条件表达式中组合这些数据类型时，任何想法如何/为什么这不起作用？什么是更 Pythonic 的方法来做看起来很啰嗦的事情？

Thanks

谢谢

Answer 1

采纳答案by jezrael

I think you need add parentheses ()to conditions, also better is use ixfor selecting column with boolean mask which can be assigned to variable mask:

我认为您需要()在条件中添加括号，最好ix用于选择带有布尔掩码的列，该列可以分配给变量mask：

mask = (df['A'].notnull()) & (df['B'].isnull()) & (df['C']=='yes')
print (mask)
0     True
1     True
2    False
3    False
dtype: bool

df.ix[mask, 'D'] = df.ix[mask, 'A']

print (df)
          A   B      C         D
0 -0.681771 NaN    yes -0.681771
1 -0.871787 NaN    yes -0.871787
2 -0.805301 NaN     no          
3  1.264103 NaN  maybe

Answer 2

回答by Luke Olson

In case this solution doesn't work for anyone, another situation that happened to me was that even though I was reading all data in as dtype=str(and therefore doing any string comparison should be OK [ie df[col] == "some string"]), I had a column of all nulls, which becomes type float, which will give an error when comparing to a string.

万一这个解决方案对任何人都不起作用，发生在我身上的另一种情况是，即使我正在读取所有数据dtype=str（因此进行任何字符串比较应该没问题 [ie df[col] == "some string"]），我有一列所有空值, 变成 type float，与字符串比较时会出错。

To get around that, you can use .astype(str)to ensure a string to string comparison will be performed.

为了解决这个问题，您可以使用.astype(str)来确保将执行字符串到字符串的比较。

pandas 熊猫表子集给出无效的类型比较错误

提问by dreab

采纳答案by jezrael

回答by Luke Olson

相关推荐

最近更新

标签

pandas 熊猫表子集给出无效的类型比较错误

提问by dreab

采纳答案by jezrael

回答by Luke Olson

相关推荐

将 Pandas 转换为 Spark 时出现类型错误

Pandas：获取系列的前 10 个元素

仅使用公共列的多个数据框的 pandas.concat

pandas 熊猫排序并保持索引不变

相关推荐

最近更新

标签