pandas 熊猫删除值小于给定值的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44552550/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:47:20  来源:igfitidea点击:

Pandas drop rows with value less than a given value

pythonpandas

提问by Jaswanth Kumar

I would like to delete rows that contain only values that are less than 10 and greater than 25. My sample dataframe will look like this:

我想删除仅包含小于 10 且大于 25 的值的行。我的示例数据框将如下所示:

a   b   c  
1   2   3  
4   5   16  
11  24  22  
26  50  65  

Expected Output:

预期输出:

a   b   c  
1   2   3  
4   5   16   
26  50  65  

So if the row contains any value less than 10 or greater than 25, then the row will stay in dataframe, otherwise, it needs to be dropped.

因此,如果该行包含任何小于 10 或大于 25 的值,则该行将保留在数据帧中,否则需要将其删除。

Is there any way I can achieve this with Pandas instead of iterating through all the rows?

有什么方法可以用 Pandas 实现这一点,而不是遍历所有行?

回答by Rakesh Adhikesavan

You can call applyand return the results to a new column called 'Keep'. You can then use this column to drop rows that you don't need.

您可以调用apply并将结果返回到名为“Keep”的新列。然后,您可以使用此列删除不需要的行。

import pandas as pd
l = [[1,2,3],[4,5,6],[11,24,22],[26,50,65]]
df = pd.DataFrame(l, columns = ['a','b','c']) #Set up sample dataFrame

df['keep'] = df.apply(lambda row: sum(any([(x < 10) or (x > 25) for x in row])), axis = 1)

The any()function returns a generator. Calling sum(generator)simply returns the sum of all the results stored in the generator.

any()函数返回一个生成器。调用sum(generator)只是返回存储在生成器中的所有结果的总和。

Check thison how any()works. Apply function still iterates over all the rows like a for loop, but the code looks cleaner this way. I cannot think of a way to do this without iterating over all the rows.

检查是如何any()工作的。Apply 函数仍然像 for 循环一样遍历所有行,但这样代码看起来更简洁。如果不迭代所有行,我想不出一种方法来做到这一点。

Output:

输出:

    a   b   c  keep
0   1   2   3     1
1   4   5   6     1
2  11  24  22     0
3  26  50  65     1


df = df[df['keep'] == 1] #Drop unwanted rows

回答by Prageeth Jayathissa

You can use pandas boolean indexing

您可以使用Pandas布尔索引

dropped_df = df.loc[((df<10) | (df>25)).any(1)]
  • df<10will return a boolean df
  • |is the OR operator
  • .any(1)returns any true element over the axis 1 (rows) see documentation
  • df.loc[]then filters the dataframe based on the boolean df
  • df<10将返回一个布尔值 df
  • |是 OR 运算符
  • .any(1)返回轴 1(行)上的任何真实元素,请参阅文档
  • df.loc[]然后根据布尔 df 过滤数据框