Pandas DataFrame:如何在多个条件下选择行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45590769/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas DataFrame : How to select rows on multiple conditions?
提问by Juan Carlos
I'm trying to select rows of a DataFrame based on a list of conditions that needs to be all satisfied. Those conditions are stored in a dictionary and are of the form {column: max-value}.
我正在尝试根据需要全部满足的条件列表来选择 DataFrame 的行。这些条件存储在字典中,格式为 {column: max-value}。
This is an example: dict = {'name': 4.0, 'sex': 0.0, 'city': 2, 'age': 3.0}
这是一个例子: dict = {'name': 4.0, 'sex': 0.0, 'city': 2, 'age': 3.0}
I need to select all DataFrame rows where the corresponding attribute is less than or equal to the corresponding value in the dictionary.
我需要选择对应属性小于或等于字典中对应值的所有 DataFrame 行。
I know that for selecting rows based on two or more conditions I can write:
我知道要根据两个或多个条件选择行,我可以写:
rows = df[(df[column1] <= dict[column1]) & (df[column2] <= dict[column2])]
My question is, how can I select rows that matches the conditions present in a dictionary in a Pythonic way? I tried this way,
我的问题是,如何以 Pythonic 的方式选择与字典中存在的条件匹配的行?我试过这个方法
keys = dict.keys()
rows = df[(df[kk] <= dict[kk]) for kk in keys]
but it gives me an error = "[ expected
" that doesn't disappear even putting the [
symbol.
但它给了我一个错误 = " [ expected
" 即使放置[
符号也不会消失。
采纳答案by MaxU
we can use DataFrame.query()method like this:
我们可以像这样使用DataFrame.query()方法:
In [109]: dct = {'name': 4.0, 'sex': 0.0, 'city': 2, 'age': 3.0}
In [110]: qry = ' and '.join(['{} <= {}'.format(k,v) for k,v in dct.items()])
In [111]: qry
Out[111]: 'name <= 4.0 and sex <= 0.0 and city <= 2 and age <= 3.0'
In [112]: df.query(qry)
...
回答by unutbu
You could take advantage of Pandas' automatic axis alignment. Given a DataFrame with columns ['age', 'city', 'name', 'sex']
and a Series with the same index, you can compare every entry in the DataFrame against the corresponding value in the Series using
您可以利用 Pandas 的自动轴对齐。给定一个带有列的 DataFrame['age', 'city', 'name', 'sex']
和一个具有相同索引的系列,您可以将 DataFrame 中的每个条目与系列中的相应值进行比较
In [29]: df < pd.Series(dct)
Out[29]:
age city name sex
0 False False False False
1 False False False False
2 True False False False
3 False True False False
4 True True True False
...
Then you can find the rows which are all True
using
然后你可以找到所有True
使用的行
mask = (df <= pd.Series(dct)).all(axis=1)
and select those rows with df.loc[mask, :]
. For example,
并选择那些带有df.loc[mask, :]
. 例如,
import numpy as np
import pandas as pd
np.random.seed(2017)
N = 300
df = pd.DataFrame({'name':np.random.randint(10, size=N),
'sex':np.random.randint(2, size=N),
'city':np.random.randint(10, size=N),
'age':np.random.randint(10, size=N)})
dct = {'name': 4.0, 'sex': 0.0, 'city': 2, 'age': 3.0}
mask = (df <= pd.Series(dct)).all(axis=1)
print(df.loc[mask, :])
yields
产量
age city name sex
7 3 2 0 0
10 1 2 4 0
150 1 2 4 0
188 2 2 2 0
198 3 2 3 0
229 1 2 0 0
254 1 2 2 0
275 3 2 1 0
276 0 1 4 0
299 3 1 2 0
回答by Abdou
You can also try:
你也可以试试:
import pandas as pd
import numpy as np
N = 300
df = pd.DataFrame({'name':np.random.randint(10, size=N),
'sex':np.random.randint(2, size=N),
'city':np.random.randint(10, size=N),
'age':np.random.randint(10, size=N)})
dct = {'name': 4.0, 'sex': 0.0, 'city': 2, 'age': 3.0}
df.loc[np.prod([df[k] <= v for k,v in dct.items()],axis=0).astype(bool),:]
# age city name sex
# 7 3 2 0 0
# 10 1 2 4 0
# 150 1 2 4 0
# 188 2 2 2 0
# 198 3 2 3 0
# 229 1 2 0 0
# 254 1 2 2 0
# 275 3 2 1 0
# 276 0 1 4 0
# 299 3 1 2 0