pandas 计算数据框列中真/假的出现次数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/53415751/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Count occurences of True/False in column of dataframe
提问by Luca Giorgi
Is there a way to count the number of occurrences of boolean values in a column without having to loop through the DataFrame?
有没有办法计算列中布尔值的出现次数而不必循环遍历 DataFrame?
Doing something like
做类似的事情
df[df["boolean_column"]==False]["boolean_column"].sum()
Will not work because False has a value of 0, hence a sum of zeroes will always return 0.
将不起作用,因为 False 的值为 0,因此零之和将始终返回 0。
Obviously you could count the occurrences by looping over the column and checking, but I wanted to know if there's a pythonic way of doing this.
显然,您可以通过遍历列并检查来计算出现次数,但我想知道是否有一种 Pythonic 方法可以做到这一点。
回答by user3471881
>> df = pd.DataFrame({'boolean_column': [True, False, True, False, True]})
>> df['boolean_column'].value_counts()
True 3
False 2
Name: boolean_column, dtype: int64
If you want to count Falseand Trueseparately you can use pd.Series.sum()+ ~:
如果你想单独计数False,True你可以使用pd.Series.sum()+ ~:
>> df['boolean_column'].values.sum() # True
3
>> (~df['boolean_column']).values.sum() # False
2
回答by jpp
With Pandas, the natural way is using value_counts:
对于 Pandas,自然的方式是使用value_counts:
df = pd.DataFrame({'A': [True, False, True, False, True]})
print(df['A'].value_counts())
# True 3
# False 2
# Name: A, dtype: int64
To calculate Trueor Falsevalues separately, don't compare against True/ Falseexplicitly, just sumand take the reverse Boolean via ~to count Falsevalues:
要单独计算True或False值,不要与True/False显式比较,只需sum使用反向布尔值~来计算False值:
print(df['A'].sum()) # 3
print((~df['A']).sum()) # 2
This works because boolis a subclass of int, and the behaviour also holds true for Pandas series / NumPy arrays.
这是有效的,因为它bool是 的子类int,并且该行为也适用于 Pandas 系列/NumPy 数组。
Alternatively, you can calculate counts using NumPy:
或者,您可以使用 NumPy 计算计数:
print(np.unique(df['A'], return_counts=True))
# (array([False, True], dtype=bool), array([2, 3], dtype=int64))
回答by FMarazzi
You could simply sum:
你可以简单地总结:
sum(df["boolean_column"])
This will find the number of "True" elements.
这将找到“真”元素的数量。
len(df["boolean_column"]) - sum(df["boolean_column"])
Will yield the number of "False" elements.
将产生“False”元素的数量。
回答by turbojet780
df.isnull()
returns a boolean value. Trueindicates a missing value.
返回一个布尔值。True表示缺失值。
df.isnull().sum()
returns column wise sum of Truevalues.
返回列明智的True值的总和。
df.isnull().sum().sum()
returns total no of NA elements.
返回 NA 元素的总数。
回答by Jakob
This alternative works for multiple columns and/or rows as well.?
这种替代方法也适用于多列和/或多行。?
df[df==True].count(axis=0)
Will get you the total amount of Truevalues per column. For row-wise count, set axis=1.?
将为您提供True每列的总值。对于按行计数,设置axis=1.?
df[df==True].count().sum()
Adding a sum()in the end will get you the total amount in the entire DataFrame.
最后添加 asum()将获得整个 DataFrame 中的总量。
回答by Andrea Grianti
In case you have a column in a DataFrame with boolean values, or even more interesting, in case you do not have it but you want to find the number of values in a column satisfying a certain condition you can try something like this (as an example I used <=):
如果你在 DataFrame 中有一个带有布尔值的列,或者更有趣的是,如果你没有它但你想找到满足特定条件的列中的值的数量,你可以尝试这样的事情(作为我使用的示例 <=):
(df['col']<=value).value_counts()
the parenthesis create a tuple with # of True/False values which you can use for other calcs as well, accessing the tuple adding [0] for False counts and [1] for True counts even without creating an additional variable:
括号创建一个包含 # of True/False 值的元组,您也可以将其用于其他计算,访问元组添加 [0] 表示 False 计数和 [1] 表示 True 计数,即使不创建附加变量:
(df['col']<=value).value_counts()[0] #for falses
(df['col']<=value).value_counts()[1] #for trues

