pandas 使用字典中的值过滤熊猫数据框

Question

提问by Ivan

I need to filter a data frame with a dict, constructed with the key being the column name and the value being the value that I want to filter:

我需要用字典过滤数据框，键是列名，值是我要过滤的值：

filter_v = {'A':1, 'B':0, 'C':'This is right'}
# this would be the normal approach
df[(df['A'] == 1) & (df['B'] ==0)& (df['C'] == 'This is right')]

But I want to do something on the lines

但我想在线上做点什么

for column, value in filter_v.items():
    df[df[column] == value]

but this will filter the data frame several times, one value at a time, and not apply all filters at the same time. Is there a way to do it programmatically?

但这会多次过滤数据框，一次一个值，而不是同时应用所有过滤器。有没有办法以编程方式做到这一点？

EDIT: an example:

编辑：一个例子：

df1 = pd.DataFrame({'A':[1,0,1,1, np.nan], 'B':[1,1,1,0,1], 'C':['right','right','wrong','right', 'right'],'D':[1,2,2,3,4]})
filter_v = {'A':1, 'B':0, 'C':'right'}
df1.loc[df1[filter_v.keys()].isin(filter_v.values()).all(axis=1), :]

gives

给

    A   B   C   D
0   1   1   right   1
1   0   1   right   2
3   1   0   right   3

but the expected result was

但预期的结果是

    A   B   C   D
3   1   0   right   3

only the last one should be selected.

只应选择最后一个。

Answer 1

回答by DSM

IIUC, you should be able to do something like this:

IIUC，你应该能够做这样的事情：

>>> df1.loc[(df1[list(filter_v)] == pd.Series(filter_v)).all(axis=1)]
   A  B      C  D
3  1  0  right  3

This works by making a Series to compare against:

这通过制作一个系列来比较：

>>> pd.Series(filter_v)
A        1
B        0
C    right
dtype: object

Selecting the corresponding part of df1:

选择的对应部分df1：

>>> df1[list(filter_v)]
    A      C  B
0   1  right  1
1   0  right  1
2   1  wrong  1
3   1  right  0
4 NaN  right  1

Finding where they match:

找到他们匹配的地方：

>>> df1[list(filter_v)] == pd.Series(filter_v)
       A      B      C
0   True  False   True
1  False  False   True
2   True  False  False
3   True   True   True
4  False  False   True

Finding where they allmatch:

找到它们都匹配的地方：

>>> (df1[list(filter_v)] == pd.Series(filter_v)).all(axis=1)
0    False
1    False
2    False
3     True
4    False
dtype: bool

And finally using this to index into df1:

最后使用它来索引 df1：

>>> df1.loc[(df1[list(filter_v)] == pd.Series(filter_v)).all(axis=1)]
   A  B      C  D
3  1  0  right  3

Answer 2

回答by Primer

Here is a way to do it:

这是一种方法：

df.loc[df[filter_v.keys()].isin(filter_v.values()).all(axis=1), :]

UPDATE:

更新：

With values being the same across columns you could then do something like this:

随着列之间的值相同，您可以执行以下操作：

# Create your filtering function:

def filter_dict(df, dic):
    return df[df[dic.keys()].apply(
            lambda x: x.equals(pd.Series(dic.values(), index=x.index, name=x.name)), asix=1)]

# Use it on your DataFrame:

filter_dict(df1, filter_v)

Which yields:

其中产生：

   A  B      C  D
3  1  0  right  3

If it something that you do frequently you could go as far as to patch DataFrame for an easy access to this filter:

如果您经常这样做，您可以尽可能修补 DataFrame 以便轻松访问此过滤器：

pd.DataFrame.filter_dict_ = filter_dict

And then use this filter like this:

然后像这样使用这个过滤器：

df1.filter_dict_(filter_v)

Which would yield the same result.

这将产生相同的结果。

BUT, it is not the rightway to do it, clearly. I would use DSM's approach.

但是，这显然不是正确的方法。我会使用 DSM 的方法。

Answer 3

回答by E. Zeytinci

For python2, that's OK in @primer's answer. But, you should be careful in Python3 because of dict_keys. For instance,

对于python2，@primer 的回答没问题。但是，由于dict_keys，您应该在 Python3 中小心。例如，

>> df.loc[df[filter_v.keys()].isin(filter_v.values()).all(axis=1), :]
>> TypeError: unhashable type: 'dict_keys'

The correct way to Python3:

Python3的正确方法：

df.loc[df[list(filter_v.keys())].isin(list(filter_v.values())).all(axis=1), :]

Answer 4

回答by efajardo

Here's another way:

这是另一种方式：

filterSeries = pd.Series(np.ones(df.shape[0],dtype=bool))
for column, value in filter_v.items():
    filterSeries = ((df[column] == value) & filterSeries)

This gives:

这给出：

>>> df[filterSeries]
   A  B      C  D
3  1  0  right  3

Answer 5

回答by Ben Saunders

Abstraction of the above for case of passing array of filter values rather than single value (analogous to pandas.core.series.Series.isin()). Using the same example:

对于传递过滤器值数组而不是单个值的情况的上述抽象（类似于 pandas.core.series.Series.isin()）。使用相同的示例：

df1 = pd.DataFrame({'A':[1,0,1,1, np.nan], 'B':[1,1,1,0,1], 'C':['right','right','wrong','right', 'right'],'D':[1,2,2,3,4]})
filter_v = {'A':[1], 'B':[1,0], 'C':['right']}
##Start with array of all True
ind = [True] * len(df1)

##Loop through filters, updating index
for col, vals in filter_v.items():
    ind = ind & (df1[col].isin(vals))

##Return filtered dataframe
df1[ind]

##Returns

    A   B    C      D
0   1.0 1   right   1
3   1.0 0   right   3

Answer 6

回答by Harunobu

To follow up on DSM's answer, you can also use any()to turn your query into an OR operation (instead of AND):

要跟进 DSM 的回答，您还可以使用any()将您的查询转换为 OR 操作（而不是 AND）：

df1.loc[(df1[list(filter_v)] == pd.Series(filter_v)).any(axis=1)]

pandas 使用字典中的值过滤熊猫数据框

提问by Ivan

回答by DSM

回答by Primer

回答by E. Zeytinci

回答by efajardo

回答by Ben Saunders

回答by Harunobu

相关推荐

最近更新

标签

pandas 使用字典中的值过滤熊猫数据框

提问by Ivan

回答by DSM

回答by Primer

回答by E. Zeytinci

回答by efajardo

回答by Ben Saunders

回答by Harunobu

相关推荐

pandas Scikit-learn - 多项逻辑回归的错误输入形状错误

pandas 熊猫读科学记数法和变化

Excel 到 Pandas DataFrame 使用第一列作为索引

Pandas .fillna() 不填充 Python 3 中 DataFrame 中的值

相关推荐

最近更新

标签