Pandas 按功能过滤数据框行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51589573/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas filter data frame rows by function
提问by Karl Adler
I want to filter a data frame by more complex function based on different values in the row.
我想根据行中的不同值通过更复杂的函数过滤数据框。
Is there a possibility to filter DF rows by a boolean function like you can do it e.g. in ES6 filter function?
是否有可能通过布尔函数过滤 DF 行,就像在ES6 过滤器函数中一样?
Extreme simplified example to illustrate the problem:
说明问题的极端简化示例:
import pandas as pd
def filter_fn(row):
if row['Name'] == 'Alisa' and row['Age'] > 24:
return False
return row
d = {
'Name': ['Alisa', 'Bobby', 'jodha', 'Hyman', 'raghu', 'Cathrine',
'Alisa', 'Bobby', 'kumar', 'Alisa', 'Alex', 'Cathrine'],
'Age': [26, 24, 23, 22, 23, 24, 26, 24, 22, 23, 24, 24],
'Score': [85, 63, 55, 74, 31, 77, 85, 63, 42, 62, 89, 77]}
df = pd.DataFrame(d, columns=['Name', 'Age', 'Score'])
df = df.apply(filter_fn, axis=1, broadcast=True)
print(df)
I found something using apply() bit this actually returns only False
/True
filled rows using a bool function, which is expected.
我发现使用 apply() 位的东西实际上仅使用 bool 函数返回False
/True
填充的行,这是预期的。
My workaround would be returning the row itself when the function result would be True and returning False if not. But this would require a additional filtering after that.
我的解决方法是在函数结果为 True 时返回行本身,否则返回 False。但这之后需要额外的过滤。
Name Age Score
0 False False False
1 Bobby 24 63
2 jodha 23 55
3 Hyman 22 74
4 raghu 23 31
5 Cathrine 24 77
6 False False False
7 Bobby 24 63
8 kumar 22 42
9 Alisa 23 62
10 Alex 24 89
11 Cathrine 24 77
采纳答案by jezrael
I think function is here not necessary, better and mainly faster is use boolean indexing
:
我认为函数在这里不是必需的,使用更好,主要是更快boolean indexing
:
m = (df['Name'] == 'Alisa') & (df['Age'] > 24)
print(m)
0 True
1 False
2 False
3 False
4 False
5 False
6 True
7 False
8 False
9 False
10 False
11 False
dtype: bool
#invert mask by ~
df1 = df[~m]
Function solution - is necessary return boolean only, better if some complicated filtering - need return for each row boolean only:
函数解决方案 - 只需要返回布尔值,如果一些复杂的过滤更好 - 只需要为每一行返回布尔值:
def filter_fn(row):
if row['Name'] == 'Alisa' and row['Age'] > 24:
return False
else:
return True
df = pd.DataFrame(d, columns=['Name', 'Age', 'Score'])
m = df.apply(filter_fn, axis=1)
print(m)
0 False
1 True
2 True
3 True
4 True
5 True
6 False
7 True
8 True
9 True
10 True
11 True
dtype: bool
df1 = df[m]