pandas 熊猫删除值小于给定值的行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44552550/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas drop rows with value less than a given value
提问by Jaswanth Kumar
I would like to delete rows that contain only values that are less than 10 and greater than 25. My sample dataframe will look like this:
我想删除仅包含小于 10 且大于 25 的值的行。我的示例数据框将如下所示:
a b c
1 2 3
4 5 16
11 24 22
26 50 65
Expected Output:
预期输出:
a b c
1 2 3
4 5 16
26 50 65
So if the row contains any value less than 10 or greater than 25, then the row will stay in dataframe, otherwise, it needs to be dropped.
因此,如果该行包含任何小于 10 或大于 25 的值,则该行将保留在数据帧中,否则需要将其删除。
Is there any way I can achieve this with Pandas instead of iterating through all the rows?
有什么方法可以用 Pandas 实现这一点,而不是遍历所有行?
回答by Rakesh Adhikesavan
You can call applyand return the results to a new column called 'Keep'. You can then use this column to drop rows that you don't need.
您可以调用apply并将结果返回到名为“Keep”的新列。然后,您可以使用此列删除不需要的行。
import pandas as pd
l = [[1,2,3],[4,5,6],[11,24,22],[26,50,65]]
df = pd.DataFrame(l, columns = ['a','b','c']) #Set up sample dataFrame
df['keep'] = df.apply(lambda row: sum(any([(x < 10) or (x > 25) for x in row])), axis = 1)
The any()
function returns a generator. Calling sum(generator)
simply returns the sum of all the results stored in the generator.
该any()
函数返回一个生成器。调用sum(generator)
只是返回存储在生成器中的所有结果的总和。
Check thison how any()
works.
Apply function still iterates over all the rows like a for loop, but the code looks cleaner this way. I cannot think of a way to do this without iterating over all the rows.
检查这是如何any()
工作的。Apply 函数仍然像 for 循环一样遍历所有行,但这样代码看起来更简洁。如果不迭代所有行,我想不出一种方法来做到这一点。
Output:
输出:
a b c keep
0 1 2 3 1
1 4 5 6 1
2 11 24 22 0
3 26 50 65 1
df = df[df['keep'] == 1] #Drop unwanted rows
回答by Prageeth Jayathissa
You can use pandas boolean indexing
您可以使用Pandas布尔索引
dropped_df = df.loc[((df<10) | (df>25)).any(1)]
df<10
will return a boolean df|
is the OR operator.any(1)
returns any true element over the axis 1 (rows) see documentationdf.loc[]
then filters the dataframe based on the boolean df
df<10
将返回一个布尔值 df|
是 OR 运算符.any(1)
返回轴 1(行)上的任何真实元素,请参阅文档df.loc[]
然后根据布尔 df 过滤数据框