pandas DataFrame 在布尔掩码上设置值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30519140/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:24:32  来源:igfitidea点击:

pandas DataFrame set value on boolean mask

pythonpandas

提问by Michael K

I'm trying to set a number of different in a pandas DataFrame all to the same value. I thought I understood boolean indexing for pandas, but I haven't found any resources on this specific error.

我正在尝试将 Pandas DataFrame 中的许多不同设置为相同的值。我以为我了解 Pandas 的布尔索引,但我没有找到有关此特定错误的任何资源。

import pandas as pd 
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
mask = df.isin([1, 3, 12, 'a'])
df[mask] = 30
Traceback (most recent call last):
...
TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

Above, I want to replace all of the Trueentries in the mask with the value 30.

上面,我想用值替换True掩码中的所有条目30

I could do df.replaceinstead, but masking feels a bit more efficient and intuitive here. Can someone explain the error, and provide an efficient way to set all of the values?

我可以这样做df.replace,但在这里屏蔽感觉更有效和更直观。有人可以解释错误,并提供设置所有值的有效方法吗?

回答by EdChum

You can't use the boolean mask on mixed dtypes for this unfortunately, you can use pandas whereto set the values:

不幸的是,您不能在混合 dtypes 上使用布尔掩码,您可以使用Pandaswhere来设置值:

In [59]:
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
mask = df.isin([1, 3, 12, 'a'])
df = df.where(mask, other=30)
df

Out[59]:
    A   B
0   1   a
1  30  30
2   3  30

Note: that the above will fail if you do inplace=Truein the wheremethod, so df.where(mask, other=30, inplace=True)will raise:

注意:如果您inplace=Truewhere方法中执行上述操作将失败,因此df.where(mask, other=30, inplace=True)将引发:

TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

类型错误:无法对具有非 np.nan 值的混合类型进行就地布尔设置

EDIT

编辑

OK, after a little misunderstanding you can still use wherey just inverting the mask:

好的,在有点误解之后你仍然可以使用wherey 只是反转掩码:

In [2]:    
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
mask = df.isin([1, 3, 12, 'a'])
df.where(~mask, other=30)

Out[2]:
    A   B
0  30  30
1   2   b
2  30   f

回答by toto_tico

If you want to use different columns to create your mask, you need to call the valuespropertyof the dataframe.

如果要使用不同的列来创建 mask,则需要调用数据values属性



Example

例子

Let's say we want to, replace values in A_1and 'A_2' according to a mask in B_1and B_2. For example, replace those values in A(to 999) that corresponds to nulls in B.

假设我们想要A_1根据B_1and 中的掩码替换and 'A_2' 中的值B_2。例如,替换A(to 999) 中对应于 中的空值的那些值B

The original dataframe:

原始数据框:

   A_1  A_2  B_1  B_2
0    1    4    y    n
1    2    5    n  NaN
2    3    6  NaN  NaN

The desired dataframe

所需的数据框

   A_1  A_2  B_1  B_2
0    1    4    y    n
1    2  999    n  NaN
2  999  999  NaN  NaN

The code:

编码:

df = pd.DataFrame({
     'A_1': [1, 2, 3], 
     'A_2': [4, 5, 6], 
     'B_1': ['y', 'n', np.nan], 
     'B_2': ['n', np.nan, np.nan]})

_mask = df[['B_1', 'B_2']].notnull().values
df[['A_1', 'A_2']] = df[['A_1','A_2']].where(_mask, other=999)



   A_1  A_2
0    1    4
1    2  999
2  999  999

回答by JohnE

I'm not 100% sure but I suspect the error message relates to the fact that there is not identical treatment of missing data across different dtypes. Only float has NaN, but integers can be automatically converted to floats so it's not a problem there. But it appears mixing number dtypes and object dtypes does not work so easily...

我不是 100% 确定,但我怀疑错误消息与以下事实有关,即对不同 dtype 中的缺失数据的处理方式不同。只有浮点数有 NaN,但整数可以自动转换为浮点数,所以这不是问题。但似乎混合数字 dtypes 和对象 dtypes 并不那么容易......

Regardless of that, you could get around it pretty easily with np.where:

无论如何,您可以通过以下方式轻松解决它np.where

df[:] = np.where( mask, 30, df ) 

    A   B
0  30  30
1   2   b
2  30   f

回答by Paul Jtheitroademan

pandasuses NaNto mark invalid or missing data and can be used across types, since your DataFrameas mixed int and string data types it will not accept the assignment to a single type (other than NaN) as this would create a mixed type (int and str) in Bthrough an in-place assignment.

pandas用于NaN标记无效或丢失的数据,并且可以跨类型使用,因为您DataFrame作为混合的 int 和 string 数据类型,它不会接受分配给单个类型(除了NaN),因为这会创建混合类型(int 和 str)B通过就地分配。

@JohnE method using np.wherecreates a new DataFramein which the type of column Bis an object not a string as in the initial example.

@JohnE 方法 usingnp.where创建一个新的DataFrame,其中列的类型B是一个对象,而不是初始示例中的字符串。