在 Pandas DataFrame 的每一行中查找第一个非零值

Question

提问by slaw

I have a Pandas DataFrame:

我有一个Pandas数据帧：

import pandas as pd

df = pd.DataFrame([[0.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0],
                   [1.0, 0.0, 1.0, 3.0, 1.0, 1.0, 7.0, 0.0],
                   [0.0, 0.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0]
                  ]
                  , columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'])

     A    B     C     D     E     F     G     H
0  0.0  2.0   3.0   4.0   5.0   6.0   7.0   8.0
1  1.0  0.0   1.0   3.0   1.0   1.0   7.0   0.0
2  0.0  0.0  13.0  14.0  15.0  16.0  17.0  18.0

And I'd like to return a series(not a list) of the first non-zero value in each row. This currently works but lookupreturns a list instead of a series (I know I can convert the list to a series) but I'm assuming there's a better way:

我想返回每行中第一个非零值的系列（不是列表）。这目前有效，但lookup返回一个列表而不是一个系列（我知道我可以将列表转换为一个系列）但我假设有更好的方法：

first_nonzero_colnames = (df > 0).idxmax(axis=1, skipna=True)
df.lookup(first_nonzero_colnames.index, first_nonzero_colnames.values)

[  2.   1.  13.]

I can use .applybut I want to avoid it.

我可以使用，.apply但我想避免它。

Answer 1

采纳答案by acushner

try this:

尝试这个：

res = df[df != 0.0].bfill(axis=1)['A']

all i'm doing is replacing all non-zeros with nans and then filling them in from the right, which forces all resulting values in the first column to be the first non-zero value in the row.

我所做的就是用nans替换所有非零值，然后从右侧填充它们，这会强制第一列中的所有结果值成为该行中的第一个非零值。

or, a quicker way, as suggested by @piRSquared:

或者，更快的方法，如@piRSquared 所建议的：

df.replace(0, np.nan).bfill(1).iloc[:, 0]

Answer 2

回答by ayhan

This seems to work:

这似乎有效：

df[df!=0].cumsum(axis=1).min(axis=1)
Out[74]: 
0     2.0
1     1.0
2    13.0
dtype: float64

Answer 3

回答by piRSquared

@acushner's answer is better. Just putting this out there.

@acushner 的回答更好。只是把这个放在那里。

use idxmaxand apply

使用idxmax和apply

m = (df != 0).idxmax(1)
df.T.apply(lambda x: x[m[x.name]])

0     2.0
1     1.0
2    13.0
dtype: float64

This also works:

这也有效：

m = (df != 0).idxmax(1)
t = zip(m.index, m.values)

df.stack().loc[t].reset_index(1, drop=True)

Answer 4

回答by andrew

I'm not sure that I would call this "better". But it returns a series in a one liner.

我不确定我会称之为“更好”。但它在一个班轮中返回一个系列。

df.apply(lambda x: x[np.where(x > 0)[0][0]], axis=1)
>>>
0     2.0
1     1.0
2    13.0
dtype: float64

Answer 5

回答by Victor Vulovic

Here's a very fast way using .apply and .nonzero()

这是使用 .apply 和.nonzero()的一种非常快速的方法

 df2.apply(lambda x: x.iloc[x.nonzero()[0][0]], axis=1)
 >>>
 0     2.0
 1     1.0
 2    13.0
 dtype: float64

performance:

表现：

%%timeit
df2.apply(lambda x: x.iloc[x.nonzero()[0][0]], axis=1)
>>>
190 μs ± 8.18 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

在 Pandas DataFrame 的每一行中查找第一个非零值

提问by slaw

采纳答案by acushner

回答by ayhan

回答by piRSquared

回答by andrew

回答by Victor Vulovic

相关推荐

最近更新

标签

在 Pandas DataFrame 的每一行中查找第一个非零值

提问by slaw

采纳答案by acushner

回答by ayhan

回答by piRSquared

回答by andrew

回答by Victor Vulovic

相关推荐

Pandas：将 timedelta 列添加到 datetime 列（矢量化）

pandas 从图像文件列表创建熊猫数据框

pandas DataFrame：添加其值为现有列的分位数/排名的列？

pandas 如何计算数据帧行的标准偏差？

相关推荐

最近更新

标签