pandas 根据布尔值列表返回数据帧子集
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45494649/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Return dataframe subset based on a list of boolean values
提问by user7180132
I'm trying to slice a dataframe based on list of values, how would I go about this?
我正在尝试根据值列表对数据框进行切片,我将如何处理?
Say I have an expression or a list l = [0,1,0,0,1,1,0,0,0,1]
说我有一个表达式或一个列表 l = [0,1,0,0,1,1,0,0,0,1]
How to return those rows in a dataframe, df
, when the corresponding value in the expression/list is 1? In this example, I would include rows where index is 1, 4, 5, and 9.
df
当表达式/列表中的相应值为 1 时,如何返回数据帧中的那些行?在此示例中,我将包含索引为 1、4、5 和 9 的行。
回答by Willem Van Onsem
You can use masking here:
您可以在此处使用遮罩:
df[np.array([0,1,0,0,1,1,0,0,0,1],dtype=bool)]
So we construct a boolean array with true and false. Every place where the array is True is a row we select.
所以我们构造了一个带有真假的布尔数组。数组为 True 的每个地方都是我们选择的一行。
Mind that we do notfilter inplace. In order to retrieve the result, you have to assign the result to an (optionally different) variable:
请注意,我们不会就地过滤。为了检索结果,您必须将结果分配给(可选不同的)变量:
df2 = df[np.array([0,1,0,0,1,1,0,0,0,1],dtype=bool)]
回答by ayhan
Convert the list to a boolean array and then use boolean indexing:
将列表转换为布尔数组,然后使用布尔索引:
df = pd.DataFrame(np.random.randint(10, size=(10, 3)))
df[np.array(lst).astype(bool)]
Out:
0 1 2
1 8 6 3
4 2 7 3
5 7 2 3
9 1 3 4
回答by piRSquared
Setup
Borrowed @ayhan's setup
设置
借用@ayhan 的设置
df = pd.DataFrame(np.random.randint(10, size=(10, 3)))
Without numpy
not the fastest, but it holds its own and is definitely the shortest.
没有numpy
最快的,但它有自己的,绝对是最短的。
df[list(map(bool, lst))]
0 1 2
1 3 5 6
4 6 3 2
5 5 7 6
9 0 0 1
Timing
定时
results.div(results.min(1), 0).round(2).pipe(lambda d: d.assign(Best=d.idxmin(1)))
ayh wvo pir mxu wen Best
N
1 1.53 1.00 1.02 4.95 2.61 wvo
3 1.06 1.00 1.04 5.46 2.84 wvo
10 1.00 1.00 1.00 4.30 2.73 ayh
30 1.00 1.05 1.24 4.06 3.76 ayh
100 1.16 1.00 1.19 3.90 3.53 wvo
300 1.29 1.00 1.32 2.50 2.38 wvo
1000 1.54 1.00 2.19 2.24 3.85 wvo
3000 1.39 1.00 2.17 1.81 4.55 wvo
10000 1.22 1.00 2.21 1.35 4.36 wvo
30000 1.19 1.00 2.26 1.39 5.36 wvo
100000 1.19 1.00 2.19 1.31 4.82 wvo
fig, (a1, a2) = plt.subplots(2, 1, figsize=(6, 6))
results.plot(loglog=True, lw=3, ax=a1)
results.div(results.min(1), 0).round(2).plot.bar(logy=True, ax=a2)
fig.tight_layout()
Testing Code
测试代码
ayh = lambda d, l: d[np.array(l).astype(bool)]
wvo = lambda d, l: d[np.array(l, dtype=bool)]
pir = lambda d, l: d[list(map(bool, l))]
wen = lambda d, l: d.loc[[i for i, x in enumerate(l) if x == 1], :]
def mxu(d, l):
a = np.array(l)
return d.query('@a != 0')
results = pd.DataFrame(
index=pd.Index([1, 3, 10, 30, 100, 300,
1000, 3000, 10000, 30000, 100000], name='N'),
columns='ayh wvo pir mxu wen'.split(),
dtype=float
)
for i in results.index:
d = pd.concat([df] * i, ignore_index=True)
l = lst * i
for j in results.columns:
stmt = '{}(d, l)'.format(j)
setp = 'from __main__ import d, l, {}'.format(j)
results.set_value(i, j, timeit(stmt, setp, number=10))
回答by MaxU
yet another "creative" approach:
另一种“创造性”方法:
In [181]: a = np.array(lst)
In [182]: df.query("index * @a > 0")
Out[182]:
0 1 2
1 1 5 5
4 0 2 0
5 4 9 9
9 2 2 5
or much better variant from @ayhan:
In [183]: df.query("@a != 0")
Out[183]:
0 1 2
1 1 5 5
4 0 2 0
5 4 9 9
9 2 2 5
PS i've also borrowed @Ayhan's setup
PS我还借用了@Ayhan的设置
回答by YOBEN_S
Or maybe find the position of 1 in your list
and slice from the Dataframe
或者也许找到 1 在你的位置list
并从Dataframe
df.loc[[i for i,x in enumerate(lst) if x == 1],:]
回答by pylang
Selecting using a list of Booleans is something itertools.compress
does well.
使用布尔值列表进行选择非常有用itertools.compress
。
Given
给定的
>>> df = pd.DataFrame(np.random.randint(10, size=(10, 2)))
>>> selectors = [0, 1, 0, 0, 1, 1, 0, 0, 0, 1]
Code
代码
>>> selected_idxs = list(itertools.compress(df.index, selectors)) # [1, 4, 5, 9]
>>> df.iloc[selected_idxs, :]
0 1
1 1 9
4 3 4
5 4 1
9 8 9