pandas 计算数据框列中真/假的出现次数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/53415751/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Count occurences of True/False in column of dataframe
提问by Luca Giorgi
Is there a way to count the number of occurrences of boolean values in a column without having to loop through the DataFrame?
有没有办法计算列中布尔值的出现次数而不必循环遍历 DataFrame?
Doing something like
做类似的事情
df[df["boolean_column"]==False]["boolean_column"].sum()
Will not work because False has a value of 0, hence a sum of zeroes will always return 0.
将不起作用,因为 False 的值为 0,因此零之和将始终返回 0。
Obviously you could count the occurrences by looping over the column and checking, but I wanted to know if there's a pythonic way of doing this.
显然,您可以通过遍历列并检查来计算出现次数,但我想知道是否有一种 Pythonic 方法可以做到这一点。
回答by user3471881
>> df = pd.DataFrame({'boolean_column': [True, False, True, False, True]})
>> df['boolean_column'].value_counts()
True 3
False 2
Name: boolean_column, dtype: int64
If you want to count False
and True
separately you can use pd.Series.sum()
+ ~
:
如果你想单独计数False
,True
你可以使用pd.Series.sum()
+ ~
:
>> df['boolean_column'].values.sum() # True
3
>> (~df['boolean_column']).values.sum() # False
2
回答by jpp
With Pandas, the natural way is using value_counts
:
对于 Pandas,自然的方式是使用value_counts
:
df = pd.DataFrame({'A': [True, False, True, False, True]})
print(df['A'].value_counts())
# True 3
# False 2
# Name: A, dtype: int64
To calculate True
or False
values separately, don't compare against True
/ False
explicitly, just sum
and take the reverse Boolean via ~
to count False
values:
要单独计算True
或False
值,不要与True
/False
显式比较,只需sum
使用反向布尔值~
来计算False
值:
print(df['A'].sum()) # 3
print((~df['A']).sum()) # 2
This works because bool
is a subclass of int
, and the behaviour also holds true for Pandas series / NumPy arrays.
这是有效的,因为它bool
是 的子类int
,并且该行为也适用于 Pandas 系列/NumPy 数组。
Alternatively, you can calculate counts using NumPy:
或者,您可以使用 NumPy 计算计数:
print(np.unique(df['A'], return_counts=True))
# (array([False, True], dtype=bool), array([2, 3], dtype=int64))
回答by FMarazzi
You could simply sum:
你可以简单地总结:
sum(df["boolean_column"])
This will find the number of "True" elements.
这将找到“真”元素的数量。
len(df["boolean_column"]) - sum(df["boolean_column"])
Will yield the number of "False" elements.
将产生“False”元素的数量。
回答by turbojet780
df.isnull()
returns a boolean value. True
indicates a missing value.
返回一个布尔值。True
表示缺失值。
df.isnull().sum()
returns column wise sum of True
values.
返回列明智的True
值的总和。
df.isnull().sum().sum()
returns total no of NA elements.
返回 NA 元素的总数。
回答by Jakob
This alternative works for multiple columns and/or rows as well.?
这种替代方法也适用于多列和/或多行。?
df[df==True].count(axis=0)
Will get you the total amount of True
values per column. For row-wise count, set axis=1
.?
将为您提供True
每列的总值。对于按行计数,设置axis=1
.?
df[df==True].count().sum()
Adding a sum()
in the end will get you the total amount in the entire DataFrame.
最后添加 asum()
将获得整个 DataFrame 中的总量。
回答by Andrea Grianti
In case you have a column in a DataFrame with boolean values, or even more interesting, in case you do not have it but you want to find the number of values in a column satisfying a certain condition you can try something like this (as an example I used <=):
如果你在 DataFrame 中有一个带有布尔值的列,或者更有趣的是,如果你没有它但你想找到满足特定条件的列中的值的数量,你可以尝试这样的事情(作为我使用的示例 <=):
(df['col']<=value).value_counts()
the parenthesis create a tuple with # of True/False values which you can use for other calcs as well, accessing the tuple adding [0] for False counts and [1] for True counts even without creating an additional variable:
括号创建一个包含 # of True/False 值的元组,您也可以将其用于其他计算,访问元组添加 [0] 表示 False 计数和 [1] 表示 True 计数,即使不创建附加变量:
(df['col']<=value).value_counts()[0] #for falses
(df['col']<=value).value_counts()[1] #for trues