pandas DataFrame 在布尔掩码上设置值

Question

提问by Michael K

I'm trying to set a number of different in a pandas DataFrame all to the same value. I thought I understood boolean indexing for pandas, but I haven't found any resources on this specific error.

我正在尝试将 Pandas DataFrame 中的许多不同设置为相同的值。我以为我了解 Pandas 的布尔索引，但我没有找到有关此特定错误的任何资源。

import pandas as pd 
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
mask = df.isin([1, 3, 12, 'a'])
df[mask] = 30
Traceback (most recent call last):
...
TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

Above, I want to replace all of the Trueentries in the mask with the value 30.

上面，我想用值替换True掩码中的所有条目30。

I could do df.replaceinstead, but masking feels a bit more efficient and intuitive here. Can someone explain the error, and provide an efficient way to set all of the values?

我可以这样做df.replace，但在这里屏蔽感觉更有效和更直观。有人可以解释错误，并提供设置所有值的有效方法吗？

Answer 1

回答by EdChum

You can't use the boolean mask on mixed dtypes for this unfortunately, you can use pandas whereto set the values:

不幸的是，您不能在混合 dtypes 上使用布尔掩码，您可以使用Pandaswhere来设置值：

In [59]:
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
mask = df.isin([1, 3, 12, 'a'])
df = df.where(mask, other=30)
df

Out[59]:
    A   B
0   1   a
1  30  30
2   3  30

Note: that the above will fail if you do inplace=Truein the wheremethod, so df.where(mask, other=30, inplace=True)will raise:

注意：如果您inplace=True在where方法中执行上述操作将失败，因此df.where(mask, other=30, inplace=True)将引发：

TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

类型错误：无法对具有非 np.nan 值的混合类型进行就地布尔设置

EDIT

编辑

OK, after a little misunderstanding you can still use wherey just inverting the mask:

好的，在有点误解之后你仍然可以使用wherey 只是反转掩码：

In [2]:    
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
mask = df.isin([1, 3, 12, 'a'])
df.where(~mask, other=30)

Out[2]:
    A   B
0  30  30
1   2   b
2  30   f

Answer 2

回答by toto_tico

If you want to use different columns to create your mask, you need to call the valuespropertyof the dataframe.

如果要使用不同的列来创建 mask，则需要调用数据框的values属性。

Example

例子

Let's say we want to, replace values in A_1and 'A_2' according to a mask in B_1and B_2. For example, replace those values in A(to 999) that corresponds to nulls in B.

假设我们想要A_1根据B_1and 中的掩码替换and 'A_2' 中的值B_2。例如，替换A(to 999) 中对应于中的空值的那些值B。

The original dataframe:

原始数据框：

   A_1  A_2  B_1  B_2
0    1    4    y    n
1    2    5    n  NaN
2    3    6  NaN  NaN

The desired dataframe

所需的数据框

   A_1  A_2  B_1  B_2
0    1    4    y    n
1    2  999    n  NaN
2  999  999  NaN  NaN

The code:

编码：

df = pd.DataFrame({
     'A_1': [1, 2, 3], 
     'A_2': [4, 5, 6], 
     'B_1': ['y', 'n', np.nan], 
     'B_2': ['n', np.nan, np.nan]})

_mask = df[['B_1', 'B_2']].notnull().values
df[['A_1', 'A_2']] = df[['A_1','A_2']].where(_mask, other=999)



   A_1  A_2
0    1    4
1    2  999
2  999  999

Answer 3

回答by JohnE

I'm not 100% sure but I suspect the error message relates to the fact that there is not identical treatment of missing data across different dtypes. Only float has NaN, but integers can be automatically converted to floats so it's not a problem there. But it appears mixing number dtypes and object dtypes does not work so easily...

我不是 100% 确定，但我怀疑错误消息与以下事实有关，即对不同 dtype 中的缺失数据的处理方式不同。只有浮点数有 NaN，但整数可以自动转换为浮点数，所以这不是问题。但似乎混合数字 dtypes 和对象 dtypes 并不那么容易......

Regardless of that, you could get around it pretty easily with np.where:

无论如何，您可以通过以下方式轻松解决它np.where：

df[:] = np.where( mask, 30, df ) 

    A   B
0  30  30
1   2   b
2  30   f

Answer 4

回答by Paul Jtheitroademan

pandasuses NaNto mark invalid or missing data and can be used across types, since your DataFrameas mixed int and string data types it will not accept the assignment to a single type (other than NaN) as this would create a mixed type (int and str) in Bthrough an in-place assignment.

pandas用于NaN标记无效或丢失的数据，并且可以跨类型使用，因为您DataFrame作为混合的 int 和 string 数据类型，它不会接受分配给单个类型（除了NaN），因为这会创建混合类型（int 和 str）B通过就地分配。

@JohnE method using np.wherecreates a new DataFramein which the type of column Bis an object not a string as in the initial example.

@JohnE 方法 usingnp.where创建一个新的DataFrame，其中列的类型B是一个对象，而不是初始示例中的字符串。

pandas DataFrame 在布尔掩码上设置值

提问by Michael K

回答by EdChum

回答by toto_tico

Example

例子

回答by JohnE

回答by Paul Jtheitroademan

相关推荐

最近更新

标签

pandas DataFrame 在布尔掩码上设置值

提问by Michael K

回答by EdChum

回答by toto_tico

Example

例子

回答by JohnE

回答by Paul Jtheitroademan

相关推荐

在 pandas Series 中设置值很慢，为什么？

pandas 如何使用 cython（或 numpy）加速熊猫

pandas 是否可以使用熊猫的 read_csv 读取分类列？

pandas 应用函数后，在 DataFrame 中就地更改系列

相关推荐

最近更新

标签