Python 在 Pandas 数据框中删除全为零的行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22649693/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Drop rows with all zeros in pandas data frame
提问by user308827
I can use pandasdropna()functionality to remove rows with some or all columns set as NA's. Is there an equivalent function for dropping rows with all columns having value 0?
我可以使用pandasdropna()功能删除某些或所有列设置为NA's 的行。是否有等效的函数来删除所有列都为 0 的行?
P kt b tt mky depth
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 1.1 3 4.5 2.3 9.0
In this example, we would like to drop the first 4 rows from the data frame.
在这个例子中,我们想从数据框中删除前 4 行。
thanks!
谢谢!
采纳答案by U2EF1
It turns out this can be nicely expressed in a vectorized fashion:
事实证明,这可以用矢量化的方式很好地表达:
> df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
> df = df[(df.T != 0).any()]
> df
a b
1 0 1
2 1 0
3 1 1
回答by 8one6
You can use a quick lambdafunction to check if all the values in a given row are 0. Then you can use the result of applying that lambdaas a way to choose only the rows that match or don't match that condition:
您可以使用快速lambda函数来检查给定行中的所有值是否为0。然后,您可以使用应用结果lambda作为仅选择匹配或不匹配该条件的行的一种方式:
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3),
index=['one', 'two', 'three', 'four', 'five'],
columns=list('abc'))
df.loc[['one', 'three']] = 0
print df
print df.loc[~df.apply(lambda row: (row==0).all(), axis=1)]
Yields:
产量:
a b c
one 0.000000 0.000000 0.000000
two 2.240893 1.867558 -0.977278
three 0.000000 0.000000 0.000000
four 0.410599 0.144044 1.454274
five 0.761038 0.121675 0.443863
[5 rows x 3 columns]
a b c
two 2.240893 1.867558 -0.977278
four 0.410599 0.144044 1.454274
five 0.761038 0.121675 0.443863
[3 rows x 3 columns]
回答by Akavall
import pandas as pd
df = pd.DataFrame({'a' : [0,0,1], 'b' : [0,0,-1]})
temp = df.abs().sum(axis=1) == 0
df = df.drop(temp)
Result:
结果:
>>> df
a b
2 1 -1
回答by 8one6
One-liner. No transpose needed:
单行。无需转置:
df.loc[~(df==0).all(axis=1)]
And for those who like symmetry, this also works...
对于那些喜欢对称的人来说,这也有效......
df.loc[(df!=0).any(axis=1)]
回答by stackpopped
Replace the zeros with nanand then drop the rows with all entries as nan.
After that replace nanwith zeros.
用 替换零,nan然后删除所有条目为 的行nan。之后nan用零替换。
import numpy as np
df = df.replace(0, np.nan)
df = df.dropna(how='all', axis=0)
df = df.replace(np.nan, 0)
回答by The Unfun Cat
I look up this question about once a month and always have to dig out the best answer from the comments:
我大约每个月查一次这个问题,总是要从评论中找出最佳答案:
df.loc[(df!=0).any(1)]
Thanks Dan Allan!
谢谢丹艾伦!
回答by clocker
Couple of solutions I found to be helpful while looking this up, especially for larger data sets:
我发现在查找此问题时有帮助的几个解决方案,尤其是对于较大的数据集:
df[(df.sum(axis=1) != 0)] # 30% faster
df[df.values.sum(axis=1) != 0] # 3X faster
Continuing with the example from @U2EF1:
继续@U2EF1 的例子:
In [88]: df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
In [91]: %timeit df[(df.T != 0).any()]
1000 loops, best of 3: 686 μs per loop
In [92]: df[(df.sum(axis=1) != 0)]
Out[92]:
a b
1 0 1
2 1 0
3 1 1
In [95]: %timeit df[(df.sum(axis=1) != 0)]
1000 loops, best of 3: 495 μs per loop
In [96]: %timeit df[df.values.sum(axis=1) != 0]
1000 loops, best of 3: 217 μs per loop
On a larger dataset:
在更大的数据集上:
In [119]: bdf = pd.DataFrame(np.random.randint(0,2,size=(10000,4)))
In [120]: %timeit bdf[(bdf.T != 0).any()]
1000 loops, best of 3: 1.63 ms per loop
In [121]: %timeit bdf[(bdf.sum(axis=1) != 0)]
1000 loops, best of 3: 1.09 ms per loop
In [122]: %timeit bdf[bdf.values.sum(axis=1) != 0]
1000 loops, best of 3: 517 μs per loop
回答by bmc
Another alternative:
另一种选择:
# Is there anything in this row non-zero?
# df != 0 --> which entries are non-zero? T/F
# (df != 0).any(axis=1) --> are there 'any' entries non-zero row-wise? T/F of rows that return true to this statement.
# df.loc[all_zero_mask,:] --> mask your rows to only show the rows which contained a non-zero entry.
# df.shape to confirm a subset.
all_zero_mask=(df != 0).any(axis=1) # Is there anything in this row non-zero?
df.loc[all_zero_mask,:].shape
回答by Kumar Prasanna
df = df [~( df [ ['kt' 'b' 'tt' 'mky' 'depth', ] ] == 0).all(axis=1) ]
Try this command its perfectly working.
试试这个命令它完美的工作。
回答by ikbel benabdessamad
I think this solution is the shortest :
我认为这个解决方案是最短的:
df= df[df['ColName'] != 0]

