pandas 需要对数据框中的负值进行计数

Question

提问by Sanchit Aluna

I need total count of negative values in a dataframe. i am able to get for an array but unable to find for DataFrame. for array i am using below code can any one suggest me how to get the count for below DataFrame.

我需要数据框中负值的总数。我能够获得一个数组，但无法找到 DataFrame。对于我使用下面代码的数组，任何人都可以建议我如何获取下面 DataFrame 的计数。

sum(n<0 for n in numbers)

Below is my dataframe and expected result is 4

下面是我的数据框，预期结果是 4

  a  b  c  d
   -3 -2 -1  1
   -2  2  3  4
    4  5  7  8

Answer 1

采纳答案by bakkal

I am able to get for an array but unable to find for DataFrame

我能够获取一个数组但无法找到 DataFrame

It's possible to flatten the DataFrame to use functions that operation on 1D arrays. So if you're okay with that (likely to be slower than EdChum's answer):

可以将 DataFrame 展平以使用对一维数组进行操作的函数。因此，如果您对此感到满意（可能比 EdChum 的回答慢）：

>>> import pandas as pd
>>> df = pd.DataFrame({'a': [-3, -2, 4], 'b': [-2, 2, 5], 'c': [-1, 3, 7], 'd': [1, 4, 8]})
>>> df.values
array([[-3, -2, -1,  1],
       [-2,  2,  3,  4],
       [ 4,  5,  7,  8]])
>>> df.values.flatten()
array([-3, -2, -1,  1, -2,  2,  3,  4,  4,  5,  7,  8])
>>> sum(n < 0 for n in df.values.flatten())
4

Answer 2

回答by EdChum

You can call .ltto compare the df against a scalar value and then call sumtwice (this is because it sums row-wise first)

您可以调用.lt将 df 与标量值进行比较，然后调用sum两次（这是因为它首先按行求和）

In [66]:
df.lt(0).sum()

Out[66]:
a    2
b    1
c    1
d    0
dtype: int64

Call sumagain to sum the Series:

sum再次调用求和Series：

In [58]:
df.lt(0).sum().sum()

Out[58]:
4

You can also convert the boolean df to a 1-D array and call np.sum:

您还可以将布尔 df 转换为一维数组并调用np.sum：

In [62]:
np.sum((df < 0).values.ravel())

Out[62]:
4

Timings

时间安排

For a 30K row df:

对于 30K 行 df：

In [70]:
%timeit sum(n < 0 for n in df.values.flatten())
%timeit df.lt(0).sum().sum()
%timeit np.sum((df < 0).values.ravel())

1 loops, best of 3: 405 ms per loop
100 loops, best of 3: 2.36 ms per loop
1000 loops, best of 3: 770 μs per loop

The np method wins easily here ~525x faster than the loop method and ~4x faster than the pure pandas method

np 方法在这里很容易获胜，比循环方法快 525 倍，比纯 Pandas 方法快 4 倍

Answer 3

回答by Sid

I am using the following. Might not be the best way to go about it.

我正在使用以下内容。可能不是最好的方法。

negatives = len(df.loc[(df.a < 0)]) + len(df.loc[(df.b < 0)] + 
            len(df.loc[(df.c < 0)] + len(df.loc[(df.d < 0)]

Answer 4

回答by Daniel Reeves

EdChum's solution is very good, but I'd like to add another simple and acceptable solution that uses the pd.DataFrame.aggmethod, which is very commonly used and should therefore be easy to remember:

EdChum的解决方案非常好，但我想添加另一个使用该pd.DataFrame.agg方法的简单且可接受的解决方案，该方法非常常用，因此应该易于记住：

# Set up dataframe
import pandas as pd
df = pd.DataFrame({'a': [-3, -2, 4],
                   'b': [-2, 2, 5],
                   'c': [-1, 3, 7],
                   'd': [1, 4, 8]})

The pd.DataFrame.aggmethod to aggregate each row or column (columns by default) into a Series object. Then you can aggregate the series to get a scalar:

将pd.DataFrame.agg每一行或每一列（默认为列）聚合到一个 Series 对象中的方法。然后您可以聚合该系列以获得标量：

# Count all negative values in a dataframe.
df.agg(lambda x: sum(x < 0)).sum()

Output:

输出：

>>> 4

pandas 需要对数据框中的负值进行计数

提问by Sanchit Aluna

采纳答案by bakkal

回答by EdChum

回答by Sid

回答by Daniel Reeves

相关推荐

最近更新

标签

pandas 需要对数据框中的负值进行计数

提问by Sanchit Aluna

采纳答案by bakkal

回答by EdChum

回答by Sid

回答by Daniel Reeves

相关推荐

pandas python中如何在pandas中使用TA-Lib的技术指标

pandas 过滤掉多索引数据框中具有零值的行/列

pandas Geopandas 上的颜色条

pandas 将熊猫日期时间月份转换为字符串表示

相关推荐

最近更新

标签