Pandas 按功能过滤数据框行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51589573/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:51:32  来源:igfitidea点击:

Pandas filter data frame rows by function

python-3.xpandasfilter

提问by Karl Adler

I want to filter a data frame by more complex function based on different values in the row.

我想根据行中的不同值通过更复杂的函数过滤数据框。

Is there a possibility to filter DF rows by a boolean function like you can do it e.g. in ES6 filter function?

是否有可能通过布尔函数过滤 DF 行,就像在ES6 过滤器函数中一样

Extreme simplified example to illustrate the problem:

说明问题的极端简化示例:

import pandas as pd

def filter_fn(row):
    if row['Name'] == 'Alisa' and row['Age'] > 24:
        return False

    return row

d = {
    'Name': ['Alisa', 'Bobby', 'jodha', 'Hyman', 'raghu', 'Cathrine',
             'Alisa', 'Bobby', 'kumar', 'Alisa', 'Alex', 'Cathrine'],
    'Age': [26, 24, 23, 22, 23, 24, 26, 24, 22, 23, 24, 24],

    'Score': [85, 63, 55, 74, 31, 77, 85, 63, 42, 62, 89, 77]}

df = pd.DataFrame(d, columns=['Name', 'Age', 'Score'])

df = df.apply(filter_fn, axis=1, broadcast=True)

print(df)

I found something using apply() bit this actually returns only False/Truefilled rows using a bool function, which is expected.

我发现使用 apply() 位的东西实际上仅使用 bool 函数返回False/True填充的行,这是预期的。

My workaround would be returning the row itself when the function result would be True and returning False if not. But this would require a additional filtering after that.

我的解决方法是在函数结果为 True 时返回行本身,否则返回 False。但这之后需要额外的过滤。

        Name    Age  Score
0      False  False  False
1      Bobby     24     63
2      jodha     23     55
3       Hyman     22     74
4      raghu     23     31
5   Cathrine     24     77
6      False  False  False
7      Bobby     24     63
8      kumar     22     42
9      Alisa     23     62
10      Alex     24     89
11  Cathrine     24     77

采纳答案by jezrael

I think function is here not necessary, better and mainly faster is use boolean indexing:

我认为函数在这里不是必需的,使用更好,主要是更快boolean indexing

m = (df['Name'] == 'Alisa') & (df['Age'] > 24)
print(m)
0      True
1     False
2     False
3     False
4     False
5     False
6      True
7     False
8     False
9     False
10    False
11    False
dtype: bool

#invert mask by ~
df1 = df[~m]

Function solution - is necessary return boolean only, better if some complicated filtering - need return for each row boolean only:

函数解决方案 - 只需要返回布尔值,如果一些复杂的过滤更好 - 只需要为每一行返回布尔值:

def filter_fn(row):
    if row['Name'] == 'Alisa' and row['Age'] > 24:
        return False
    else:
        return True

df = pd.DataFrame(d, columns=['Name', 'Age', 'Score'])
m = df.apply(filter_fn, axis=1)
print(m)
0     False
1      True
2      True
3      True
4      True
5      True
6     False
7      True
8      True
9      True
10     True
11     True
dtype: bool

df1 = df[m]