pandas 对数据框中的布尔值求和

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38829702/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:45:20  来源:igfitidea点击:

Summing Booleans in a Dataframe

pythonpandasdataframe

提问by hms

I have a non-indexed Pandas dataframe where each row consists of numeric and boolean values with some NaNs. An example row in my dataframe might look like this (with variables above):

我有一个非索引的 Pandas 数据框,其中每一行都由带有一些 NaN 的数字和布尔值组成。我的数据框中的示例行可能如下所示(上面有变量):

X_1  X_2  X_3 X_4   X_5  X_6 X_7  X_8  X_9   X_10  X_11  X_12
24.4 True 5.1 False 22.4 55  33.4 True 18.04 False NaN   NaN

I would like to add a new variable to my dataframe, call it X_13, which is the number of True values in each row. So in the above case, I would like to obtain:

我想向我的数据X_13框中添加一个新变量,称为,这是每行中 True 值的数量。所以在上述情况下,我想获得:

X_1  X_2  X_3 X_4   X_5  X_6 X_7  X_8  X_9   X_10  X_11  X_12 X_13
24.4 True 5.1 False 22.4 55  33.4 True 18.04 False NaN   NaN  2

I have tried df[X_13] = df[X_2] + df[X_4] + df[X_8] + df[X_10]and that gives me what I want unless the row contains a NaNin a location where a Boolean is expected. For those rows, X_13has the value NaN.

我已经尝试过df[X_13] = df[X_2] + df[X_4] + df[X_8] + df[X_10],这给了我想要的东西,除非该行在NaN需要布尔值的位置包含 a 。对于那些行,X_13具有值NaN

Sorry -- this feels like it should be absurdly simple. Any suggestions?

对不起 - 这感觉应该是非常简单的。有什么建议?

回答by ayhan

Select boolean columns and then sum:

选择布尔列,然后求和:

df.select_dtypes(include=['bool']).sum(axis=1)

If you have NaNs, first fill with False's:

如果你有 NaN,首先用 False 填充:

df.fillna(False).select_dtypes(include=['bool']).sum(axis=1)


Consider this DataFrame:

考虑这个数据帧:

df
Out: 
       a      b  c     d
0   True  False  1  True
1  False   True  2   NaN

df == Truereturns True for (0, c) as well:

df == True也为 (0, c) 返回 True:

df == True
Out: 
       a      b      c      d
0   True  False   True   True
1  False   True  False  False

So if you take the sum, you will get 3 instead of 2. Another important point is that boolean arrays cannot contain NaNs. So if you check the dtypes, you will see:

所以如果你求和,你会得到 3 而不是 2。另一个重要的点是布尔数组不能包含 NaNs。因此,如果您检查 dtypes,您将看到:

df.dtypes
Out: 
a      bool
b      bool
c     int64
d    object
dtype: object

By filling with Falses you can have a boolean array:

通过填充Falses 你可以有一个布尔数组:

df.fillna(False).dtypes
Out: 
a     bool
b     bool
c    int64
d     bool
dtype: object

Now you can safely sum by selecting the boolean columns.

现在您可以通过选择布尔列来安全地求和。

df.fillna(False).select_dtypes(include=['bool']).sum(axis=1)
Out: 
0    2
1    1
dtype: int64