Python 按不在列表中的索引值对 Pandas 数据框进行切片

Question

提问by lmart999

I have a pandasdataframe, df.

我有一个pandas数据框，df.

I want to select all indices in dfthat are notin a list, blacklist.

我想选择在所有指数df是不是在列表中，blacklist.

Now, I use list comprehension to create the desired labels to slice.

现在，我使用列表理解来创建要切片的所需标签。

ix=[i for i in df.index if i not in blacklist]  
df_select=df.loc[ix]

Works fine, but may be clumsy if I need to do this often.

工作正常，但如果我需要经常这样做可能会很笨拙。

Is there a better way to do this?

有一个更好的方法吗？

Answer 1

采纳答案by EdChum

Use isinon the index and invert the boolean index to perform label selection:

isin在索引上使用并反转布尔索引以执行标签选择：

In [239]:

df = pd.DataFrame({'a':np.random.randn(5)})
df
Out[239]:
          a
0 -0.548275
1 -0.411741
2 -1.187369
3  1.028967
4 -2.755030
In [240]:

t = [2,4]
df.loc[~df.index.isin(t)]
Out[240]:
          a
0 -0.548275
1 -0.411741
3  1.028967

Answer 2

回答by Dyno Fu

import pandas as pd
df = pd.DataFrame(data=[5,6,7,8], index=[1,2,3,4], columns=['D',])
blacklist = [2,3]
#your current way ...
ix=[i for i in df.index if i not in blacklist]  
df_select=df.loc[ix]

# use a mask
mask = [True if x else False for x in df.index if x not in blacklist]
df.loc[mask]

http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-labelactually, loc and iloc both take a boolean array, in this case the mask. from now on you can reuse this mask and should be more efficient.

http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-label实际上， loc 和 iloc 都采用布尔数组，在这种情况下，mask. 从现在开始你可以重复使用这个面具，应该会更有效率。

Answer 3

回答by ASGM

You could use set()to create the difference between your original indices and those that you want to remove:

您可以使用set()创建原始索引和要删除的索引之间的差异：

df.loc[set(df.index) - set(blacklist)]

It has the advantage of being parsimonious, as well as being easier to read than a list comprehension.

它的优点是简洁，并且比列表理解更容易阅读。

Answer 4

回答by Hagrid67

Thanks to ASGM; I found that I needed to turn the set into a list to make it work with a MultiIndex:

感谢 ASGM；我发现我需要将集合变成一个列表才能使其与 MultiIndex 一起使用：

mi1 = pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1), ("b", 2)])
df1 = pd.DataFrame(data={"aaa":[1,2,3,4]}, index=mi1)
setValid = set(df1.index) - set([("a", 2)])
df1.loc[list(setValid)] # works
df1.loc[setValid] # fails

(sorry can't comment, insufficient rep)

（抱歉不能评论，代表不足）

Answer 5

回答by Hector Garcia L

If you are looking for a way to select all rows that are outside a condition you can use np.invert()given that the condition returns an array of booleans.

如果您正在寻找一种方法来选择条件之外的所有行，您可以使用np.invert()条件返回布尔数组。

df.loc[np.invert(({condition 1}) & (condition 2))]

Python 按不在列表中的索引值对 Pandas 数据框进行切片

提问by lmart999

采纳答案by EdChum

回答by Dyno Fu

回答by ASGM

回答by Hagrid67

回答by Hector Garcia L

相关推荐

最近更新

标签

Python 按不在列表中的索引值对 Pandas 数据框进行切片

提问by lmart999

采纳答案by EdChum

回答by Dyno Fu

回答by ASGM

回答by Hagrid67

回答by Hector Garcia L

相关推荐

Python 如何从烧瓶中的“ImmutableMultiDict”获取数据

Python 散景图的 X 和 Y 轴标签

Python 导入错误：无法导入名称“webdriver”

Python 用于回归的 Scikit-learn 交叉验证评分

相关推荐

最近更新

标签