Pandas DataFrame：如何在多个条件下选择行？

Question

提问by Juan Carlos

I'm trying to select rows of a DataFrame based on a list of conditions that needs to be all satisfied. Those conditions are stored in a dictionary and are of the form {column: max-value}.

我正在尝试根据需要全部满足的条件列表来选择 DataFrame 的行。这些条件存储在字典中，格式为 {column: max-value}。

This is an example: dict = {'name': 4.0, 'sex': 0.0, 'city': 2, 'age': 3.0}

这是一个例子： dict = {'name': 4.0, 'sex': 0.0, 'city': 2, 'age': 3.0}

I need to select all DataFrame rows where the corresponding attribute is less than or equal to the corresponding value in the dictionary.

我需要选择对应属性小于或等于字典中对应值的所有 DataFrame 行。

I know that for selecting rows based on two or more conditions I can write:

我知道要根据两个或多个条件选择行，我可以写：

rows = df[(df[column1] <= dict[column1]) & (df[column2] <= dict[column2])]

My question is, how can I select rows that matches the conditions present in a dictionary in a Pythonic way? I tried this way,

我的问题是，如何以 Pythonic 的方式选择与字典中存在的条件匹配的行？我试过这个方法

keys = dict.keys() 
rows = df[(df[kk] <= dict[kk]) for kk in keys]

but it gives me an error = "[ expected" that doesn't disappear even putting the [symbol.

但它给了我一个错误 = " [ expected" 即使放置[符号也不会消失。

Answer 1

采纳答案by MaxU

we can use DataFrame.query()method like this:

我们可以像这样使用DataFrame.query()方法：

In [109]: dct = {'name': 4.0, 'sex': 0.0, 'city': 2, 'age': 3.0}

In [110]: qry = ' and '.join(['{} <= {}'.format(k,v) for k,v in dct.items()])

In [111]: qry
Out[111]: 'name <= 4.0 and sex <= 0.0 and city <= 2 and age <= 3.0'

In [112]: df.query(qry)
...

Answer 2

回答by unutbu

You could take advantage of Pandas' automatic axis alignment. Given a DataFrame with columns ['age', 'city', 'name', 'sex']and a Series with the same index, you can compare every entry in the DataFrame against the corresponding value in the Series using

您可以利用 Pandas 的自动轴对齐。给定一个带有列的 DataFrame['age', 'city', 'name', 'sex']和一个具有相同索引的系列，您可以将 DataFrame 中的每个条目与系列中的相应值进行比较

In [29]: df < pd.Series(dct)
Out[29]: 
      age   city   name    sex
0   False  False  False  False
1   False  False  False  False
2    True  False  False  False
3   False   True  False  False
4    True   True   True  False
...

Then you can find the rows which are all Trueusing

然后你可以找到所有True使用的行

mask = (df <= pd.Series(dct)).all(axis=1)

and select those rows with df.loc[mask, :]. For example,

并选择那些带有df.loc[mask, :]. 例如，

import numpy as np
import pandas as pd
np.random.seed(2017)
N = 300
df = pd.DataFrame({'name':np.random.randint(10, size=N),
                   'sex':np.random.randint(2, size=N),
                   'city':np.random.randint(10, size=N),
                   'age':np.random.randint(10, size=N)})
dct = {'name': 4.0, 'sex': 0.0, 'city': 2, 'age': 3.0}

mask = (df <= pd.Series(dct)).all(axis=1)
print(df.loc[mask, :])

yields

产量

     age  city  name  sex
7      3     2     0    0
10     1     2     4    0
150    1     2     4    0
188    2     2     2    0
198    3     2     3    0
229    1     2     0    0
254    1     2     2    0
275    3     2     1    0
276    0     1     4    0
299    3     1     2    0

Answer 3

回答by Abdou

You can also try:

你也可以试试：

import pandas as pd
import numpy as np


N = 300

df = pd.DataFrame({'name':np.random.randint(10, size=N),
                   'sex':np.random.randint(2, size=N),
                   'city':np.random.randint(10, size=N),
                   'age':np.random.randint(10, size=N)})

dct = {'name': 4.0, 'sex': 0.0, 'city': 2, 'age': 3.0}

df.loc[np.prod([df[k] <= v for k,v in dct.items()],axis=0).astype(bool),:]

#      age  city  name  sex
# 7      3     2     0    0
# 10     1     2     4    0
# 150    1     2     4    0
# 188    2     2     2    0
# 198    3     2     3    0
# 229    1     2     0    0
# 254    1     2     2    0
# 275    3     2     1    0
# 276    0     1     4    0
# 299    3     1     2    0

Pandas DataFrame：如何在多个条件下选择行？

提问by Juan Carlos

采纳答案by MaxU

回答by unutbu

回答by Abdou

相关推荐

最近更新

标签

Pandas DataFrame：如何在多个条件下选择行？

提问by Juan Carlos

采纳答案by MaxU

回答by unutbu

回答by Abdou

相关推荐

pandas 比较熊猫中数据帧的标题

pandas 在数据帧上应用 python-geohash 编码函数

pandas 列列表 X 整个数据框之间的熊猫相关性

pandas 为什么 df.head() 在 python 中不起作用

相关推荐

最近更新

标签