如何在 Python Pandas 中的两个值之间选择 DataFrame 中的行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31617845/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:18:29  来源:igfitidea点击:

How to select rows in a DataFrame between two values, in Python Pandas?

pythonpandas

提问by user131983

I am trying to modify a DataFrame dfto only contain rows for which the values in the column closing_priceare between 99 and 101 and trying to do this with the code below.

我正在尝试修改 DataFramedf以仅包含列中值closing_price介于 99 和 101 之间的行,并尝试使用以下代码执行此操作。

However, I get the error

但是,我收到错误

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

ValueError:系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()

and I am wondering if there is a way to do this without using loops.

我想知道是否有办法在不使用循环的情况下做到这一点。

df = df[(99 <= df['closing_price'] <= 101)]

采纳答案by Jianxun Li

You should use ()to group your boolean vector to remove ambiguity.

您应该使用()对布尔向量进行分组以消除歧义。

df = df[(df['closing_price'] >= 99) & (df['closing_price'] <= 101)]

回答by MaxU

there is a nicer alternative - use query()method:

有一个更好的选择 - 使用query()方法:

In [58]: df = pd.DataFrame({'closing_price': np.random.randint(95, 105, 10)})

In [59]: df
Out[59]:
   closing_price
0            104
1             99
2             98
3             95
4            103
5            101
6            101
7             99
8             95
9             96

In [60]: df.query('99 <= closing_price <= 101')
Out[60]:
   closing_price
1             99
5            101
6            101
7             99

UPDATE:answering the comment:

更新:回答评论:

I like the syntax here but fell down when trying to combine with expresison; df.query('(mean + 2 *sd) <= closing_price <=(mean + 2 *sd)')

我喜欢这里的语法,但在尝试与表达结合时失败了; df.query('(mean + 2 *sd) <= closing_price <=(mean + 2 *sd)')

In [161]: qry = "(closing_price.mean() - 2*closing_price.std())" +\
     ...:       " <= closing_price <= " + \
     ...:       "(closing_price.mean() + 2*closing_price.std())"
     ...:

In [162]: df.query(qry)
Out[162]:
   closing_price
0             97
1            101
2             97
3             95
4            100
5             99
6            100
7            101
8             99
9             95

回答by Parfait

Consider also series between:

还要考虑以下之间的系列

df = df[df['closing_price'].between(99, 101)]

回答by crashMOGWAI

newdf = df.query('closing_price.mean() <= closing_price <= closing_price.std()')

or

或者

mean = closing_price.mean()
std = closing_price.std()

newdf = df.query('@mean <= closing_price <= @std')

回答by Rushabh Agarwal

Instead of this

而不是这个

df = df[(99 <= df['closing_price'] <= 101)]

You should use this

你应该用这个

df = df[(df['closing_price']>=99 ) & (df['closing_price']<=101)]

We have to use NumPy's bitwise Logic operators |, &, ~, ^ for compounding queries. Also, the parentheses are important for operator precedence.

我们必须使用 NumPy 的按位逻辑运算符 |、&、~、^ 进行复合查询。此外,括号对于运算符的优先级很重要。

For more info, you can visit the link :Comparisons, Masks, and Boolean Logic

有关更多信息,您可以访问链接:比较、掩码和布尔逻辑

回答by Riz.Khan

you can also use .between()method

你也可以使用.between()方法

emp = pd.read_csv("C:\py\programs\pandas_2\pandas\employees.csv")

emp[emp["Salary"].between(60000, 61000)]

Output

输出

enter image description here

在此处输入图片说明

回答by sparrow

If you're dealing with multiple values and multiple inputs you could also set up an apply function like this. In this case filtering a dataframe for GPS locations that fall withing certain ranges.

如果您正在处理多个值和多个输入,您还可以设置这样的应用函数。在这种情况下,过滤位于特定范围内的 GPS 位置的数据帧。

def filter_values(lat,lon):
    if abs(lat - 33.77) < .01 and abs(lon - -118.16) < .01:
        return True
    elif abs(lat - 37.79) < .01 and abs(lon - -122.39) < .01:
        return True
    else:
        return False


df = df[df.apply(lambda x: filter_values(x['lat'],x['lon']),axis=1)]