pandas 对数据框中的布尔值求和
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38829702/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Summing Booleans in a Dataframe
提问by hms
I have a non-indexed Pandas dataframe where each row consists of numeric and boolean values with some NaNs. An example row in my dataframe might look like this (with variables above):
我有一个非索引的 Pandas 数据框,其中每一行都由带有一些 NaN 的数字和布尔值组成。我的数据框中的示例行可能如下所示(上面有变量):
X_1 X_2 X_3 X_4 X_5 X_6 X_7 X_8 X_9 X_10 X_11 X_12
24.4 True 5.1 False 22.4 55 33.4 True 18.04 False NaN NaN
I would like to add a new variable to my dataframe, call it X_13
, which is the number of True values in each row. So in the above case, I would like to obtain:
我想向我的数据X_13
框中添加一个新变量,称为,这是每行中 True 值的数量。所以在上述情况下,我想获得:
X_1 X_2 X_3 X_4 X_5 X_6 X_7 X_8 X_9 X_10 X_11 X_12 X_13
24.4 True 5.1 False 22.4 55 33.4 True 18.04 False NaN NaN 2
I have tried df[X_13] = df[X_2] + df[X_4] + df[X_8] + df[X_10]
and that gives me what I want unless the row contains a NaN
in a location where a Boolean is expected. For those rows, X_13
has the value NaN
.
我已经尝试过df[X_13] = df[X_2] + df[X_4] + df[X_8] + df[X_10]
,这给了我想要的东西,除非该行在NaN
需要布尔值的位置包含 a 。对于那些行,X_13
具有值NaN
。
Sorry -- this feels like it should be absurdly simple. Any suggestions?
对不起 - 这感觉应该是非常简单的。有什么建议?
回答by ayhan
Select boolean columns and then sum:
选择布尔列,然后求和:
df.select_dtypes(include=['bool']).sum(axis=1)
If you have NaNs, first fill with False's:
如果你有 NaN,首先用 False 填充:
df.fillna(False).select_dtypes(include=['bool']).sum(axis=1)
Consider this DataFrame:
考虑这个数据帧:
df
Out:
a b c d
0 True False 1 True
1 False True 2 NaN
df == True
returns True for (0, c) as well:
df == True
也为 (0, c) 返回 True:
df == True
Out:
a b c d
0 True False True True
1 False True False False
So if you take the sum, you will get 3 instead of 2. Another important point is that boolean arrays cannot contain NaNs. So if you check the dtypes, you will see:
所以如果你求和,你会得到 3 而不是 2。另一个重要的点是布尔数组不能包含 NaNs。因此,如果您检查 dtypes,您将看到:
df.dtypes
Out:
a bool
b bool
c int64
d object
dtype: object
By filling with False
s you can have a boolean array:
通过填充False
s 你可以有一个布尔数组:
df.fillna(False).dtypes
Out:
a bool
b bool
c int64
d bool
dtype: object
Now you can safely sum by selecting the boolean columns.
现在您可以通过选择布尔列来安全地求和。
df.fillna(False).select_dtypes(include=['bool']).sum(axis=1)
Out:
0 2
1 1
dtype: int64