pandas 计算数据框列中真/假的出现次数

Question

提问by Luca Giorgi

Is there a way to count the number of occurrences of boolean values in a column without having to loop through the DataFrame?

有没有办法计算列中布尔值的出现次数而不必循环遍历 DataFrame？

Doing something like

做类似的事情

df[df["boolean_column"]==False]["boolean_column"].sum()

Will not work because False has a value of 0, hence a sum of zeroes will always return 0.

将不起作用，因为 False 的值为 0，因此零之和将始终返回 0。

Obviously you could count the occurrences by looping over the column and checking, but I wanted to know if there's a pythonic way of doing this.

显然，您可以通过遍历列并检查来计算出现次数，但我想知道是否有一种 Pythonic 方法可以做到这一点。

Answer 1

回答by user3471881

Use pd.Series.value_counts():

使用pd.Series.value_counts()：

>> df = pd.DataFrame({'boolean_column': [True, False, True, False, True]})
>> df['boolean_column'].value_counts()
True     3
False    2
Name: boolean_column, dtype: int64

If you want to count Falseand Trueseparately you can use pd.Series.sum()+ ~:

如果你想单独计数False，True你可以使用pd.Series.sum()+ ~：

>> df['boolean_column'].values.sum()  # True
3
>> (~df['boolean_column']).values.sum() # False
2

Answer 2

回答by jpp

With Pandas, the natural way is using value_counts:

对于 Pandas，自然的方式是使用value_counts：

df = pd.DataFrame({'A': [True, False, True, False, True]})

print(df['A'].value_counts())

# True     3
# False    2
# Name: A, dtype: int64

To calculate Trueor Falsevalues separately, don't compare against True/ Falseexplicitly, just sumand take the reverse Boolean via ~to count Falsevalues:

要单独计算True或False值，不要与True/False显式比较，只需sum使用反向布尔值~来计算False值：

print(df['A'].sum())     # 3
print((~df['A']).sum())  # 2

This works because boolis a subclass of int, and the behaviour also holds true for Pandas series / NumPy arrays.

这是有效的，因为它bool是的子类int，并且该行为也适用于 Pandas 系列/NumPy 数组。

Alternatively, you can calculate counts using NumPy:

或者，您可以使用 NumPy 计算计数：

print(np.unique(df['A'], return_counts=True))

# (array([False,  True], dtype=bool), array([2, 3], dtype=int64))

Answer 3

回答by FMarazzi

You could simply sum:

你可以简单地总结：

sum(df["boolean_column"])

This will find the number of "True" elements.

这将找到“真”元素的数量。

len(df["boolean_column"]) - sum(df["boolean_column"])

Will yield the number of "False" elements.

将产生“False”元素的数量。

Answer 4

回答by turbojet780

df.isnull()

returns a boolean value. Trueindicates a missing value.

返回一个布尔值。True表示缺失值。

df.isnull().sum()

returns column wise sum of Truevalues.

返回列明智的True值的总和。

df.isnull().sum().sum()

returns total no of NA elements.

返回 NA 元素的总数。

Answer 5

回答by Jakob

This alternative works for multiple columns and/or rows as well.?

这种替代方法也适用于多列和/或多行。？

df[df==True].count(axis=0)

Will get you the total amount of Truevalues per column. For row-wise count, set axis=1.?

将为您提供True每列的总值。对于按行计数，设置axis=1.?

df[df==True].count().sum()

Adding a sum()in the end will get you the total amount in the entire DataFrame.

最后添加 asum()将获得整个 DataFrame 中的总量。

Answer 6

回答by Andrea Grianti

In case you have a column in a DataFrame with boolean values, or even more interesting, in case you do not have it but you want to find the number of values in a column satisfying a certain condition you can try something like this (as an example I used <=):

如果你在 DataFrame 中有一个带有布尔值的列，或者更有趣的是，如果你没有它但你想找到满足特定条件的列中的值的数量，你可以尝试这样的事情（作为我使用的示例 <=)：

(df['col']<=value).value_counts()

the parenthesis create a tuple with # of True/False values which you can use for other calcs as well, accessing the tuple adding [0] for False counts and [1] for True counts even without creating an additional variable:

括号创建一个包含 # of True/False 值的元组，您也可以将其用于其他计算，访问元组添加 [0] 表示 False 计数和 [1] 表示 True 计数，即使不创建附加变量：

(df['col']<=value).value_counts()[0] #for falses
(df['col']<=value).value_counts()[1] #for trues

pandas 计算数据框列中真/假的出现次数

提问by Luca Giorgi

回答by user3471881

回答by jpp

回答by FMarazzi

回答by turbojet780

回答by Jakob

回答by Andrea Grianti

相关推荐

最近更新

标签

pandas 计算数据框列中真/假的出现次数

提问by Luca Giorgi

回答by user3471881

回答by jpp

回答by FMarazzi

回答by turbojet780

回答by Jakob

回答by Andrea Grianti

相关推荐

pandas 制作熊猫系列的直方图

pandas.errors.ParserError：错误可能是由于使用多字符分隔符时忽略了引号

pandas 使用python中的列表值过滤匹配列值的数据框

pandas 按索引（列）编号选择熊猫数据框中的列

相关推荐

最近更新

标签