Python 过滤值小于 0 的 Pandas 数据帧行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34243194/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filter rows of pandas dataframe whose values are lower than 0
提问by dooms
I have a pandas dataframe like this
我有一个像这样的熊猫数据框
df = pd.DataFrame(data=[[21, 1],[32, -4],[-4, 14],[3, 17],[-7,NaN]], columns=['a', 'b'])
df
I want to be able to remove all rows with negative values in a list of columnsand conserving rows with NaN.
我希望能够删除列列表中具有负值的所有行并使用 NaN 保留行。
In my example there is only 2 columns, but I have more in my dataset, so I can't do it one by one.
在我的示例中,只有 2 列,但我的数据集中有更多列,因此无法一一列出。
采纳答案by ComputerFellow
If you want to apply it to all columns, do df[df > 0]
with dropna()
:
如果你想将它应用到所有列,这样做df[df > 0]
有dropna()
:
>>> df[df > 0].dropna()
a b
0 21 1
3 3 17
If you know what columns to apply it to, then do for only those cols with df[df[cols] > 0]
:
如果您知道要将其应用于哪些列,则仅对那些列执行以下操作df[df[cols] > 0]
:
>>> cols = ['b']
>>> df[cols] = df[df[cols] > 0][cols]
>>> df.dropna()
a b
0 21 1
2 -4 14
3 3 17
回答by Reid Gahan
I've found you can simplify the answer by just doing this:
我发现你可以通过这样做来简化答案:
>>> cols = ['b']
>>> df = df[df[cols] > 0]
dropna()
is not an in-place method, so you have to store the result.
dropna()
不是就地方法,因此您必须存储结果。
>>> df = df.dropna()
回答by Zev
I was looking for a solution for this that doesn't change the dtype (which will happen if NaN's are mixed in with ints as suggested in the answers that use dropna
. Since the questioner already had a NaN in their data, that may not be an issue for them. I went with this solution which preserves the int64
dtype. Here it is with my sample data:
我正在寻找一个不改变 dtype 的解决方案(如果 NaN 与使用的答案中建议的整数混合会发生这种情况dropna
。由于提问者的数据中已经有一个 NaN,这可能不是一个他们的问题。我选择了保留int64
dtype的解决方案。这是我的示例数据:
df = pd.DataFrame(data={'a':[0, 1, 2], 'b': [-1,0,1], 'c': [-2, -1, 0]})
columns = ['b', 'c']
filter_ = (df[columns] >= 0).all(axis=1)
df[filter_]
a b c
2 2 1 0