如何在 Python Pandas 中的两个值之间选择 DataFrame 中的行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31617845/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to select rows in a DataFrame between two values, in Python Pandas?
提问by user131983
I am trying to modify a DataFrame df
to only contain rows for which the values in the column closing_price
are between 99 and 101 and trying to do this with the code below.
我正在尝试修改 DataFramedf
以仅包含列中值closing_price
介于 99 和 101 之间的行,并尝试使用以下代码执行此操作。
However, I get the error
但是,我收到错误
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
ValueError:系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()
and I am wondering if there is a way to do this without using loops.
我想知道是否有办法在不使用循环的情况下做到这一点。
df = df[(99 <= df['closing_price'] <= 101)]
采纳答案by Jianxun Li
You should use ()
to group your boolean vector to remove ambiguity.
您应该使用()
对布尔向量进行分组以消除歧义。
df = df[(df['closing_price'] >= 99) & (df['closing_price'] <= 101)]
回答by MaxU
there is a nicer alternative - use query()method:
有一个更好的选择 - 使用query()方法:
In [58]: df = pd.DataFrame({'closing_price': np.random.randint(95, 105, 10)})
In [59]: df
Out[59]:
closing_price
0 104
1 99
2 98
3 95
4 103
5 101
6 101
7 99
8 95
9 96
In [60]: df.query('99 <= closing_price <= 101')
Out[60]:
closing_price
1 99
5 101
6 101
7 99
UPDATE:answering the comment:
更新:回答评论:
I like the syntax here but fell down when trying to combine with expresison;
df.query('(mean + 2 *sd) <= closing_price <=(mean + 2 *sd)')
我喜欢这里的语法,但在尝试与表达结合时失败了;
df.query('(mean + 2 *sd) <= closing_price <=(mean + 2 *sd)')
In [161]: qry = "(closing_price.mean() - 2*closing_price.std())" +\
...: " <= closing_price <= " + \
...: "(closing_price.mean() + 2*closing_price.std())"
...:
In [162]: df.query(qry)
Out[162]:
closing_price
0 97
1 101
2 97
3 95
4 100
5 99
6 100
7 101
8 99
9 95
回答by Parfait
回答by crashMOGWAI
newdf = df.query('closing_price.mean() <= closing_price <= closing_price.std()')
or
或者
mean = closing_price.mean()
std = closing_price.std()
newdf = df.query('@mean <= closing_price <= @std')
回答by Rushabh Agarwal
Instead of this
而不是这个
df = df[(99 <= df['closing_price'] <= 101)]
You should use this
你应该用这个
df = df[(df['closing_price']>=99 ) & (df['closing_price']<=101)]
We have to use NumPy's bitwise Logic operators |, &, ~, ^ for compounding queries. Also, the parentheses are important for operator precedence.
我们必须使用 NumPy 的按位逻辑运算符 |、&、~、^ 进行复合查询。此外,括号对于运算符的优先级很重要。
For more info, you can visit the link :Comparisons, Masks, and Boolean Logic
有关更多信息,您可以访问链接:比较、掩码和布尔逻辑
回答by Riz.Khan
回答by sparrow
If you're dealing with multiple values and multiple inputs you could also set up an apply function like this. In this case filtering a dataframe for GPS locations that fall withing certain ranges.
如果您正在处理多个值和多个输入,您还可以设置这样的应用函数。在这种情况下,过滤位于特定范围内的 GPS 位置的数据帧。
def filter_values(lat,lon):
if abs(lat - 33.77) < .01 and abs(lon - -118.16) < .01:
return True
elif abs(lat - 37.79) < .01 and abs(lon - -122.39) < .01:
return True
else:
return False
df = df[df.apply(lambda x: filter_values(x['lat'],x['lon']),axis=1)]