使用多种条件 SQL 索引 Python Pandas 数据框,例如 where 语句

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17396898/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:57:41  来源:igfitidea点击:

index a Python Pandas dataframe with multiple conditions SQL like where statement

pythonsqlindexingpandas

提问by unutbu

I am experienced in R and new to Python Pandas. I am trying to index a DataFrame to retrieve rows that meet a set of several logical conditions - much like the "where" statement of SQL.

我在 R 方面经验丰富,并且是 Python Pandas 的新手。我正在尝试索引 DataFrame 以检索满足一组几个逻辑条件的行 - 很像 SQL 的“where”语句。

I know how to do this in R with dataframes (and with R's data.table package, which is more like a Pandas DataFrame than R's native dataframe).

我知道如何在 R 中使用数据帧(以及 R 的 data.table 包,它更像是 Pandas DataFrame 而不是 R 的原生数据帧)。

Here's some sample code that constructs a DataFrame and a description of how I would like to index it. Is there an easy way to do this?

这是一些构建 DataFrame 的示例代码以及我希望如何对其进行索引的描述。是否有捷径可寻?

import pandas as pd
import numpy as np

# generate some data
mult = 10000
fruits = ['Apple', 'Banana', 'Kiwi', 'Grape', 'Orange', 'Strawberry']*mult
vegetables = ['Asparagus', 'Broccoli', 'Carrot', 'Lettuce', 'Rutabaga', 'Spinach']*mult
animals = ['Dog', 'Cat', 'Bird', 'Fish', 'Lion', 'Mouse']*mult
xValues = np.random.normal(loc=80, scale=2, size=6*mult)
yValues = np.random.normal(loc=79, scale=2, size=6*mult)

data = {'Fruit': fruits,
        'Vegetable': vegetables, 
        'Animal': animals, 
        'xValue': xValues,
        'yValue': yValues,}

df = pd.DataFrame(data)

# shuffle the columns to break structure of repeating fruits, vegetables, animals
np.random.shuffle(df.Fruit)
np.random.shuffle(df.Vegetable)
np.random.shuffle(df.Animal)

df.head(30)

# filter sets
fruitsInclude = ['Apple', 'Banana', 'Grape']
vegetablesExclude = ['Asparagus', 'Broccoli']

# subset1:  All rows and columns where:
#   (fruit in fruitsInclude) AND (Vegetable not in vegetablesExlude)

# subset2:  All rows and columns where:
#   (fruit in fruitsInclude) AND [(Vegetable not in vegetablesExlude) OR (Animal == 'Dog')]

# subset3:  All rows and specific columns where above logical conditions are true.

All help and inputs welcomed and highly appreciated!

欢迎并高度赞赏所有帮助和投入!

Thanks, Randall

谢谢,兰德尔

回答by unutbu

# subset1:  All rows and columns where:
#   (fruit in fruitsInclude) AND (Vegetable not in vegetablesExlude)
df.ix[df['Fruit'].isin(fruitsInclude) & ~df['Vegetable'].isin(vegetablesExclude)]

# subset2:  All rows and columns where:
#   (fruit in fruitsInclude) AND [(Vegetable not in vegetablesExlude) OR (Animal == 'Dog')]
df.ix[df['Fruit'].isin(fruitsInclude) & (~df['Vegetable'].isin(vegetablesExclude) | (df['Animal']=='Dog'))]

# subset3:  All rows and specific columns where above logical conditions are true.
df.ix[df['Fruit'].isin(fruitsInclude) & ~df['Vegetable'].isin(vegetablesExclude) & (df['Animal']=='Dog')]