pandas 计算数据框列中真/假的出现次数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/53415751/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:10:13  来源:igfitidea点击:

Count occurences of True/False in column of dataframe

pythonpandasbooleancounterseries

提问by Luca Giorgi

Is there a way to count the number of occurrences of boolean values in a column without having to loop through the DataFrame?

有没有办法计算列中布尔值的出现次数而不必循环遍历 DataFrame?

Doing something like

做类似的事情

df[df["boolean_column"]==False]["boolean_column"].sum()

Will not work because False has a value of 0, hence a sum of zeroes will always return 0.

将不起作用,因为 False 的值为 0,因此零之和将始终返回 0。

Obviously you could count the occurrences by looping over the column and checking, but I wanted to know if there's a pythonic way of doing this.

显然,您可以通过遍历列并检查来计算出现次数,但我想知道是否有一种 Pythonic 方法可以做到这一点。

回答by user3471881

Use pd.Series.value_counts():

使用pd.Series.value_counts()

>> df = pd.DataFrame({'boolean_column': [True, False, True, False, True]})
>> df['boolean_column'].value_counts()
True     3
False    2
Name: boolean_column, dtype: int64

If you want to count Falseand Trueseparately you can use pd.Series.sum()+ ~:

如果你想单独计数FalseTrue你可以使用pd.Series.sum()+ ~

>> df['boolean_column'].values.sum()  # True
3
>> (~df['boolean_column']).values.sum() # False
2

回答by jpp

With Pandas, the natural way is using value_counts:

对于 Pandas,自然的方式是使用value_counts

df = pd.DataFrame({'A': [True, False, True, False, True]})

print(df['A'].value_counts())

# True     3
# False    2
# Name: A, dtype: int64

To calculate Trueor Falsevalues separately, don't compare against True/ Falseexplicitly, just sumand take the reverse Boolean via ~to count Falsevalues:

要单独计算TrueFalse值,不要与True/False显式比较,只需sum使用反向布尔值~来计算False值:

print(df['A'].sum())     # 3
print((~df['A']).sum())  # 2

This works because boolis a subclass of int, and the behaviour also holds true for Pandas series / NumPy arrays.

这是有效的,因为它bool是 的子类int,并且该行为也适用于 Pandas 系列/NumPy 数组。

Alternatively, you can calculate counts using NumPy:

或者,您可以使用 NumPy 计算计数:

print(np.unique(df['A'], return_counts=True))

# (array([False,  True], dtype=bool), array([2, 3], dtype=int64))

回答by FMarazzi

You could simply sum:

你可以简单地总结:

sum(df["boolean_column"])

This will find the number of "True" elements.

这将找到“真”元素的数量。

len(df["boolean_column"]) - sum(df["boolean_column"])

Will yield the number of "False" elements.

将产生“False”元素的数量。

回答by turbojet780

df.isnull() 

returns a boolean value. Trueindicates a missing value.

返回一个布尔值。True表示缺失值。

df.isnull().sum() 

returns column wise sum of Truevalues.

返回列明智的True值的总和。

df.isnull().sum().sum() 

returns total no of NA elements.

返回 NA 元素的总数。

回答by Jakob

This alternative works for multiple columns and/or rows as well.?

这种替代方法也适用于多列和/或多行。?

df[df==True].count(axis=0)

Will get you the total amount of Truevalues per column. For row-wise count, set axis=1.?

将为您提供True每列的总值。对于按行计数,设置axis=1.?

df[df==True].count().sum()

Adding a sum()in the end will get you the total amount in the entire DataFrame.

最后添加 asum()将获得整个 DataFrame 中的总量。

回答by Andrea Grianti

In case you have a column in a DataFrame with boolean values, or even more interesting, in case you do not have it but you want to find the number of values in a column satisfying a certain condition you can try something like this (as an example I used <=):

如果你在 DataFrame 中有一个带有布尔值的列,或者更有趣的是,如果你没有它但你想找到满足特定条件的列中的值的数量,你可以尝试这样的事情(作为我使用的示例 <=):

(df['col']<=value).value_counts()

the parenthesis create a tuple with # of True/False values which you can use for other calcs as well, accessing the tuple adding [0] for False counts and [1] for True counts even without creating an additional variable:

括号创建一个包含 # of True/False 值的元组,您也可以将其用于其他计算,访问元组添加 [0] 表示 False 计数和 [1] 表示 True 计数,即使不创建附加变量:

(df['col']<=value).value_counts()[0] #for falses
(df['col']<=value).value_counts()[1] #for trues