Python 过滤值小于 0 的 Pandas 数据帧行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34243194/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:41:16  来源:igfitidea点击:

Filter rows of pandas dataframe whose values are lower than 0

pythonpandas

提问by dooms

I have a pandas dataframe like this

我有一个像这样的熊猫数据框

df = pd.DataFrame(data=[[21, 1],[32, -4],[-4, 14],[3, 17],[-7,NaN]], columns=['a', 'b'])
df

I want to be able to remove all rows with negative values in a list of columnsand conserving rows with NaN.

我希望能够删除列列表中具有负值的所有行并使用 NaN 保留行。

In my example there is only 2 columns, but I have more in my dataset, so I can't do it one by one.

在我的示例中,只有 2 列,但我的数据集中有更多列,因此无法一一列出。

采纳答案by ComputerFellow

If you want to apply it to all columns, do df[df > 0]with dropna():

如果你想将它应用到所有列,这样做df[df > 0]dropna()

>>> df[df > 0].dropna()
    a   b
0  21   1
3   3  17

If you know what columns to apply it to, then do for only those cols with df[df[cols] > 0]:

如果您知道要将其应用于哪些列,则仅对那些列执行以下操作df[df[cols] > 0]

>>> cols = ['b']
>>> df[cols] = df[df[cols] > 0][cols]
>>> df.dropna()
    a   b
0  21   1
2  -4  14
3   3  17

回答by Reid Gahan

I've found you can simplify the answer by just doing this:

我发现你可以通过这样做来简化答案:

>>> cols = ['b']
>>> df = df[df[cols] > 0]

dropna()is not an in-place method, so you have to store the result.

dropna()不是就地方法,因此您必须存储结果。

>>> df = df.dropna()

回答by Zev

I was looking for a solution for this that doesn't change the dtype (which will happen if NaN's are mixed in with ints as suggested in the answers that use dropna. Since the questioner already had a NaN in their data, that may not be an issue for them. I went with this solution which preserves the int64dtype. Here it is with my sample data:

我正在寻找一个不改变 dtype 的解决方案(如果 NaN 与使用的答案中建议的整数混合会发生这种情况dropna。由于提问者的数据中已经有一个 NaN,这可能不是一个他们的问题。我选择了保留int64dtype的解决方案。这是我的示例数据:

df = pd.DataFrame(data={'a':[0, 1, 2], 'b': [-1,0,1], 'c': [-2, -1, 0]})
columns = ['b', 'c']
filter_ = (df[columns] >= 0).all(axis=1)
df[filter_]


   a  b  c
2  2  1  0