pandas 根据布尔值列表返回数据帧子集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45494649/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:10:52  来源:igfitidea点击:

Return dataframe subset based on a list of boolean values

pythonpandasdataframe

提问by user7180132

I'm trying to slice a dataframe based on list of values, how would I go about this?

我正在尝试根据值列表对数据框进行切片,我将如何处理?

Say I have an expression or a list l = [0,1,0,0,1,1,0,0,0,1]

说我有一个表达式或一个列表 l = [0,1,0,0,1,1,0,0,0,1]

How to return those rows in a dataframe, df, when the corresponding value in the expression/list is 1? In this example, I would include rows where index is 1, 4, 5, and 9.

df当表达式/列表中的相应值为 1 时,如何返回数据帧中的那些行?在此示例中,我将包含索引为 1、4、5 和 9 的行。

回答by Willem Van Onsem

You can use masking here:

您可以在此处使用遮罩:

df[np.array([0,1,0,0,1,1,0,0,0,1],dtype=bool)]

So we construct a boolean array with true and false. Every place where the array is True is a row we select.

所以我们构造了一个带有真假的布尔数组。数组为 True 的每个地方都是我们选择的一行。

Mind that we do notfilter inplace. In order to retrieve the result, you have to assign the result to an (optionally different) variable:

请注意,我们不会就地过滤。为了检索结果,您必须将结果分配给(可选不同的)变量:

df2 = df[np.array([0,1,0,0,1,1,0,0,0,1],dtype=bool)]

回答by ayhan

Convert the list to a boolean array and then use boolean indexing:

将列表转换为布尔数组,然后使用布尔索引:

df = pd.DataFrame(np.random.randint(10, size=(10, 3)))

df[np.array(lst).astype(bool)]
Out: 
   0  1  2
1  8  6  3
4  2  7  3
5  7  2  3
9  1  3  4

回答by piRSquared

Setup
Borrowed @ayhan's setup

设置
借用@ayhan 的设置

df = pd.DataFrame(np.random.randint(10, size=(10, 3)))

Without numpy
not the fastest, but it holds its own and is definitely the shortest.

没有numpy
最快的,但它有自己的,绝对是最短的。

df[list(map(bool, lst))]

   0  1  2
1  3  5  6
4  6  3  2
5  5  7  6
9  0  0  1


Timing

定时

results.div(results.min(1), 0).round(2).pipe(lambda d: d.assign(Best=d.idxmin(1)))

         ayh   wvo   pir   mxu   wen Best
N                                        
1       1.53  1.00  1.02  4.95  2.61  wvo
3       1.06  1.00  1.04  5.46  2.84  wvo
10      1.00  1.00  1.00  4.30  2.73  ayh
30      1.00  1.05  1.24  4.06  3.76  ayh
100     1.16  1.00  1.19  3.90  3.53  wvo
300     1.29  1.00  1.32  2.50  2.38  wvo
1000    1.54  1.00  2.19  2.24  3.85  wvo
3000    1.39  1.00  2.17  1.81  4.55  wvo
10000   1.22  1.00  2.21  1.35  4.36  wvo
30000   1.19  1.00  2.26  1.39  5.36  wvo
100000  1.19  1.00  2.19  1.31  4.82  wvo


fig, (a1, a2) = plt.subplots(2, 1, figsize=(6, 6))
results.plot(loglog=True, lw=3, ax=a1)
results.div(results.min(1), 0).round(2).plot.bar(logy=True, ax=a2)
fig.tight_layout()

enter image description here

在此处输入图片说明



Testing Code

测试代码

ayh = lambda d, l: d[np.array(l).astype(bool)]
wvo = lambda d, l: d[np.array(l, dtype=bool)]
pir = lambda d, l: d[list(map(bool, l))]
wen = lambda d, l: d.loc[[i for i, x in enumerate(l) if x == 1], :]

def mxu(d, l):
    a = np.array(l)
    return d.query('@a != 0')

results = pd.DataFrame(
    index=pd.Index([1, 3, 10, 30, 100, 300,
                    1000, 3000, 10000, 30000, 100000], name='N'),
    columns='ayh wvo pir mxu wen'.split(),
    dtype=float
)

for i in results.index:
    d = pd.concat([df] * i, ignore_index=True)
    l = lst * i
    for j in results.columns:
        stmt = '{}(d, l)'.format(j)
        setp = 'from __main__ import d, l, {}'.format(j)
        results.set_value(i, j, timeit(stmt, setp, number=10))

回答by MaxU

yet another "creative" approach:

另一种“创造性”方法:

In [181]: a = np.array(lst)

In [182]: df.query("index * @a > 0")
Out[182]:
   0  1  2
1  1  5  5
4  0  2  0
5  4  9  9
9  2  2  5

or much better variant from @ayhan:

或者来自@ayhan 的更好的变体

In [183]: df.query("@a != 0")
Out[183]:
   0  1  2
1  1  5  5
4  0  2  0
5  4  9  9
9  2  2  5

PS i've also borrowed @Ayhan's setup

PS我还借用了@Ayhan的设置

回答by YOBEN_S

Or maybe find the position of 1 in your listand slice from the Dataframe

或者也许找到 1 在你的位置list并从Dataframe

df.loc[[i for i,x in enumerate(lst) if x == 1],:]

回答by pylang

Selecting using a list of Booleans is something itertools.compressdoes well.

使用布尔值列表进行选择非常有用itertools.compress

Given

给定的

>>> df = pd.DataFrame(np.random.randint(10, size=(10, 2)))
>>> selectors = [0, 1, 0, 0, 1, 1, 0, 0, 0, 1]

Code

代码

>>> selected_idxs = list(itertools.compress(df.index, selectors))   # [1, 4, 5, 9]
>>> df.iloc[selected_idxs, :]
   0  1
1  1  9
4  3  4
5  4  1
9  8  9