Python 在 Pandas 数据框中删除全为零的行

Question

提问by user308827

I can use pandasdropna()functionality to remove rows with some or all columns set as NA's. Is there an equivalent function for dropping rows with all columns having value 0?

我可以使用pandasdropna()功能删除某些或所有列设置为NA's 的行。是否有等效的函数来删除所有列都为 0 的行？

P   kt  b   tt  mky depth
1   0   0   0   0   0
2   0   0   0   0   0
3   0   0   0   0   0
4   0   0   0   0   0
5   1.1 3   4.5 2.3 9.0

In this example, we would like to drop the first 4 rows from the data frame.

在这个例子中，我们想从数据框中删除前 4 行。

thanks!

谢谢！

Answer 1

采纳答案by U2EF1

It turns out this can be nicely expressed in a vectorized fashion:

事实证明，这可以用矢量化的方式很好地表达：

> df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
> df = df[(df.T != 0).any()]
> df
   a  b
1  0  1
2  1  0
3  1  1

Answer 2

回答by 8one6

You can use a quick lambdafunction to check if all the values in a given row are 0. Then you can use the result of applying that lambdaas a way to choose only the rows that match or don't match that condition:

您可以使用快速lambda函数来检查给定行中的所有值是否为0。然后，您可以使用应用结果lambda作为仅选择匹配或不匹配该条件的行的一种方式：

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randn(5,3), 
                  index=['one', 'two', 'three', 'four', 'five'],
                  columns=list('abc'))

df.loc[['one', 'three']] = 0

print df
print df.loc[~df.apply(lambda row: (row==0).all(), axis=1)]

Yields:

产量：

              a         b         c
one    0.000000  0.000000  0.000000
two    2.240893  1.867558 -0.977278
three  0.000000  0.000000  0.000000
four   0.410599  0.144044  1.454274
five   0.761038  0.121675  0.443863

[5 rows x 3 columns]
             a         b         c
two   2.240893  1.867558 -0.977278
four  0.410599  0.144044  1.454274
five  0.761038  0.121675  0.443863

[3 rows x 3 columns]

Answer 3

回答by Akavall

import pandas as pd

df = pd.DataFrame({'a' : [0,0,1], 'b' : [0,0,-1]})

temp = df.abs().sum(axis=1) == 0      
df = df.drop(temp)

Result:

结果：

>>> df
   a  b
2  1 -1

Answer 4

回答by 8one6

One-liner. No transpose needed:

单行。无需转置：

df.loc[~(df==0).all(axis=1)]

And for those who like symmetry, this also works...

对于那些喜欢对称的人来说，这也有效......

df.loc[(df!=0).any(axis=1)]

Answer 5

回答by stackpopped

Replace the zeros with nanand then drop the rows with all entries as nan. After that replace nanwith zeros.

用替换零，nan然后删除所有条目为的行nan。之后nan用零替换。

import numpy as np
df = df.replace(0, np.nan)
df = df.dropna(how='all', axis=0)
df = df.replace(np.nan, 0)

Answer 6

回答by The Unfun Cat

I look up this question about once a month and always have to dig out the best answer from the comments:

我大约每个月查一次这个问题，总是要从评论中找出最佳答案：

df.loc[(df!=0).any(1)]

Thanks Dan Allan!

谢谢丹艾伦！

Answer 7

回答by clocker

Couple of solutions I found to be helpful while looking this up, especially for larger data sets:

我发现在查找此问题时有帮助的几个解决方案，尤其是对于较大的数据集：

df[(df.sum(axis=1) != 0)]       # 30% faster 
df[df.values.sum(axis=1) != 0]  # 3X faster

Continuing with the example from @U2EF1:

继续@U2EF1 的例子：

In [88]: df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})

In [91]: %timeit df[(df.T != 0).any()]
1000 loops, best of 3: 686 μs per loop

In [92]: df[(df.sum(axis=1) != 0)]
Out[92]: 
   a  b
1  0  1
2  1  0
3  1  1

In [95]: %timeit df[(df.sum(axis=1) != 0)]
1000 loops, best of 3: 495 μs per loop

In [96]: %timeit df[df.values.sum(axis=1) != 0]
1000 loops, best of 3: 217 μs per loop

On a larger dataset:

在更大的数据集上：

In [119]: bdf = pd.DataFrame(np.random.randint(0,2,size=(10000,4)))

In [120]: %timeit bdf[(bdf.T != 0).any()]
1000 loops, best of 3: 1.63 ms per loop

In [121]: %timeit bdf[(bdf.sum(axis=1) != 0)]
1000 loops, best of 3: 1.09 ms per loop

In [122]: %timeit bdf[bdf.values.sum(axis=1) != 0]
1000 loops, best of 3: 517 μs per loop

Answer 8

回答by bmc

Another alternative:

另一种选择：

# Is there anything in this row non-zero?
# df != 0 --> which entries are non-zero? T/F
# (df != 0).any(axis=1) --> are there 'any' entries non-zero row-wise? T/F of rows that return true to this statement.
# df.loc[all_zero_mask,:] --> mask your rows to only show the rows which contained a non-zero entry.
# df.shape to confirm a subset.

all_zero_mask=(df != 0).any(axis=1) # Is there anything in this row non-zero?
df.loc[all_zero_mask,:].shape

Answer 9

回答by Kumar Prasanna

df = df [~( df [ ['kt'  'b'   'tt'  'mky' 'depth', ] ] == 0).all(axis=1) ]

Try this command its perfectly working.

试试这个命令它完美的工作。

Answer 10

回答by ikbel benabdessamad

I think this solution is the shortest :

我认为这个解决方案是最短的：

df= df[df['ColName'] != 0]

Python 在 Pandas 数据框中删除全为零的行

提问by user308827

采纳答案by U2EF1

回答by 8one6

回答by Akavall

回答by 8one6

回答by stackpopped

回答by The Unfun Cat

回答by clocker

回答by bmc

回答by Kumar Prasanna

回答by ikbel benabdessamad

相关推荐

最近更新

标签

Python 在 Pandas 数据框中删除全为零的行

提问by user308827

采纳答案by U2EF1

回答by 8one6

回答by Akavall

回答by 8one6

回答by stackpopped

回答by The Unfun Cat

回答by clocker

回答by bmc

回答by Kumar Prasanna

回答by ikbel benabdessamad

相关推荐

Python Selenium：遍历元素组

使用python请求库进行谷歌搜索

Python 导入错误：没有名为 lxml 的模块 - 即使安装了 LXML

Python 在 py.test 中的每个测试之前和之后运行代码？

相关推荐

最近更新

标签