Python 按不在列表中的索引值对 Pandas 数据框进行切片
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29134635/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Slice Pandas dataframe by index values that are not in a list
提问by lmart999
I have a pandas
dataframe, df
.
我有一个pandas
数据框,df
.
I want to select all indices in df
that are notin a list, blacklist.
我想选择在所有指数df
是不是在列表中,blacklist.
Now, I use list comprehension to create the desired labels to slice.
现在,我使用列表理解来创建要切片的所需标签。
ix=[i for i in df.index if i not in blacklist]
df_select=df.loc[ix]
Works fine, but may be clumsy if I need to do this often.
工作正常,但如果我需要经常这样做可能会很笨拙。
Is there a better way to do this?
有一个更好的方法吗?
采纳答案by EdChum
Use isin
on the index and invert the boolean index to perform label selection:
isin
在索引上使用并反转布尔索引以执行标签选择:
In [239]:
df = pd.DataFrame({'a':np.random.randn(5)})
df
Out[239]:
a
0 -0.548275
1 -0.411741
2 -1.187369
3 1.028967
4 -2.755030
In [240]:
t = [2,4]
df.loc[~df.index.isin(t)]
Out[240]:
a
0 -0.548275
1 -0.411741
3 1.028967
回答by Dyno Fu
import pandas as pd
df = pd.DataFrame(data=[5,6,7,8], index=[1,2,3,4], columns=['D',])
blacklist = [2,3]
#your current way ...
ix=[i for i in df.index if i not in blacklist]
df_select=df.loc[ix]
# use a mask
mask = [True if x else False for x in df.index if x not in blacklist]
df.loc[mask]
http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-labelactually, loc and iloc both take a boolean array, in this case the mask
. from now on you can reuse this mask and should be more efficient.
http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-label实际上, loc 和 iloc 都采用布尔数组,在这种情况下,mask
. 从现在开始你可以重复使用这个面具,应该会更有效率。
回答by ASGM
回答by Hagrid67
Thanks to ASGM; I found that I needed to turn the set into a list to make it work with a MultiIndex:
感谢 ASGM;我发现我需要将集合变成一个列表才能使其与 MultiIndex 一起使用:
mi1 = pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1), ("b", 2)])
df1 = pd.DataFrame(data={"aaa":[1,2,3,4]}, index=mi1)
setValid = set(df1.index) - set([("a", 2)])
df1.loc[list(setValid)] # works
df1.loc[setValid] # fails
(sorry can't comment, insufficient rep)
(抱歉不能评论,代表不足)
回答by Hector Garcia L
If you are looking for a way to select all rows that are outside a condition you can use np.invert()
given that the condition returns an array of booleans.
如果您正在寻找一种方法来选择条件之外的所有行,您可以使用np.invert()
条件返回布尔数组。
df.loc[np.invert(({condition 1}) & (condition 2))]