pandas 根据布尔值列表返回数据帧子集

Question

提问by user7180132

I'm trying to slice a dataframe based on list of values, how would I go about this?

我正在尝试根据值列表对数据框进行切片，我将如何处理？

Say I have an expression or a list l = [0,1,0,0,1,1,0,0,0,1]

说我有一个表达式或一个列表 l = [0,1,0,0,1,1,0,0,0,1]

How to return those rows in a dataframe, df, when the corresponding value in the expression/list is 1? In this example, I would include rows where index is 1, 4, 5, and 9.

df当表达式/列表中的相应值为 1 时，如何返回数据帧中的那些行？在此示例中，我将包含索引为 1、4、5 和 9 的行。

Answer 1

回答by Willem Van Onsem

You can use masking here:

您可以在此处使用遮罩：

df[np.array([0,1,0,0,1,1,0,0,0,1],dtype=bool)]

So we construct a boolean array with true and false. Every place where the array is True is a row we select.

所以我们构造了一个带有真假的布尔数组。数组为 True 的每个地方都是我们选择的一行。

Mind that we do notfilter inplace. In order to retrieve the result, you have to assign the result to an (optionally different) variable:

请注意，我们不会就地过滤。为了检索结果，您必须将结果分配给（可选不同的）变量：

df2 = df[np.array([0,1,0,0,1,1,0,0,0,1],dtype=bool)]

Answer 2

回答by ayhan

Convert the list to a boolean array and then use boolean indexing:

将列表转换为布尔数组，然后使用布尔索引：

df = pd.DataFrame(np.random.randint(10, size=(10, 3)))

df[np.array(lst).astype(bool)]
Out: 
   0  1  2
1  8  6  3
4  2  7  3
5  7  2  3
9  1  3  4

Answer 3

回答by piRSquared

Setup
Borrowed @ayhan's setup

设置
借用@ayhan 的设置

df = pd.DataFrame(np.random.randint(10, size=(10, 3)))

Without numpy
not the fastest, but it holds its own and is definitely the shortest.

没有numpy
最快的，但它有自己的，绝对是最短的。

df[list(map(bool, lst))]

   0  1  2
1  3  5  6
4  6  3  2
5  5  7  6
9  0  0  1

Timing

定时

results.div(results.min(1), 0).round(2).pipe(lambda d: d.assign(Best=d.idxmin(1)))

         ayh   wvo   pir   mxu   wen Best
N                                        
1       1.53  1.00  1.02  4.95  2.61  wvo
3       1.06  1.00  1.04  5.46  2.84  wvo
10      1.00  1.00  1.00  4.30  2.73  ayh
30      1.00  1.05  1.24  4.06  3.76  ayh
100     1.16  1.00  1.19  3.90  3.53  wvo
300     1.29  1.00  1.32  2.50  2.38  wvo
1000    1.54  1.00  2.19  2.24  3.85  wvo
3000    1.39  1.00  2.17  1.81  4.55  wvo
10000   1.22  1.00  2.21  1.35  4.36  wvo
30000   1.19  1.00  2.26  1.39  5.36  wvo
100000  1.19  1.00  2.19  1.31  4.82  wvo

fig, (a1, a2) = plt.subplots(2, 1, figsize=(6, 6))
results.plot(loglog=True, lw=3, ax=a1)
results.div(results.min(1), 0).round(2).plot.bar(logy=True, ax=a2)
fig.tight_layout()

Testing Code

测试代码

ayh = lambda d, l: d[np.array(l).astype(bool)]
wvo = lambda d, l: d[np.array(l, dtype=bool)]
pir = lambda d, l: d[list(map(bool, l))]
wen = lambda d, l: d.loc[[i for i, x in enumerate(l) if x == 1], :]

def mxu(d, l):
    a = np.array(l)
    return d.query('@a != 0')

results = pd.DataFrame(
    index=pd.Index([1, 3, 10, 30, 100, 300,
                    1000, 3000, 10000, 30000, 100000], name='N'),
    columns='ayh wvo pir mxu wen'.split(),
    dtype=float
)

for i in results.index:
    d = pd.concat([df] * i, ignore_index=True)
    l = lst * i
    for j in results.columns:
        stmt = '{}(d, l)'.format(j)
        setp = 'from __main__ import d, l, {}'.format(j)
        results.set_value(i, j, timeit(stmt, setp, number=10))

Answer 4

回答by MaxU

yet another "creative" approach:

另一种“创造性”方法：

In [181]: a = np.array(lst)

In [182]: df.query("index * @a > 0")
Out[182]:
   0  1  2
1  1  5  5
4  0  2  0
5  4  9  9
9  2  2  5

or much better variant from @ayhan:

或者来自@ayhan 的更好的变体：

In [183]: df.query("@a != 0")
Out[183]:
   0  1  2
1  1  5  5
4  0  2  0
5  4  9  9
9  2  2  5

PS i've also borrowed @Ayhan's setup

PS我还借用了@Ayhan的设置

Answer 5

回答by YOBEN_S

Or maybe find the position of 1 in your listand slice from the Dataframe

或者也许找到 1 在你的位置list并从Dataframe

df.loc[[i for i,x in enumerate(lst) if x == 1],:]

Answer 6

回答by pylang

Selecting using a list of Booleans is something itertools.compressdoes well.

使用布尔值列表进行选择非常有用itertools.compress。

Given

给定的

>>> df = pd.DataFrame(np.random.randint(10, size=(10, 2)))
>>> selectors = [0, 1, 0, 0, 1, 1, 0, 0, 0, 1]

Code

代码

>>> selected_idxs = list(itertools.compress(df.index, selectors))   # [1, 4, 5, 9]
>>> df.iloc[selected_idxs, :]
   0  1
1  1  9
4  3  4
5  4  1
9  8  9

pandas 根据布尔值列表返回数据帧子集

提问by user7180132

回答by Willem Van Onsem

回答by ayhan

回答by piRSquared

回答by MaxU

回答by YOBEN_S

回答by pylang

相关推荐

最近更新

标签

pandas 根据布尔值列表返回数据帧子集

提问by user7180132

回答by Willem Van Onsem

回答by ayhan

回答by piRSquared

回答by MaxU

回答by YOBEN_S

回答by pylang

相关推荐

pandas 为什么会出现错误 - 无法连接非 NDFrame 对象

pandas：无法使用 Timestamp 的这些索引器 [2016-08-01 00:00:00] 对 DatetimeIndex 进行位置索引

“字段列表”python pandas 中的未知列“nan”

如何使用多个 numpy 1d 数组创建一个 Pandas DataFrame？

相关推荐

最近更新

标签