在 Pandas DataFrame 的每一行中查找第一个非零值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38467749/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find First Non-zero Value in Each Row of Pandas DataFrame
提问by slaw
I have a Pandas DataFrame:
我有一个Pandas数据帧:
import pandas as pd
df = pd.DataFrame([[0.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0],
[1.0, 0.0, 1.0, 3.0, 1.0, 1.0, 7.0, 0.0],
[0.0, 0.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0]
]
, columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'])
A B C D E F G H
0 0.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
1 1.0 0.0 1.0 3.0 1.0 1.0 7.0 0.0
2 0.0 0.0 13.0 14.0 15.0 16.0 17.0 18.0
And I'd like to return a series(not a list) of the first non-zero value in each row. This currently works but lookup
returns a list instead of a series (I know I can convert the list to a series) but I'm assuming there's a better way:
我想返回每行中第一个非零值的系列(不是列表)。这目前有效,但lookup
返回一个列表而不是一个系列(我知道我可以将列表转换为一个系列)但我假设有更好的方法:
first_nonzero_colnames = (df > 0).idxmax(axis=1, skipna=True)
df.lookup(first_nonzero_colnames.index, first_nonzero_colnames.values)
[ 2. 1. 13.]
I can use .apply
but I want to avoid it.
我可以使用,.apply
但我想避免它。
采纳答案by acushner
try this:
尝试这个:
res = df[df != 0.0].bfill(axis=1)['A']
all i'm doing is replacing all non-zeros with nan
s and then filling them in from the right, which forces all resulting values in the first column to be the first non-zero value in the row.
我所做的就是用nan
s替换所有非零值,然后从右侧填充它们,这会强制第一列中的所有结果值成为该行中的第一个非零值。
or, a quicker way, as suggested by @piRSquared:
或者,更快的方法,如@piRSquared 所建议的:
df.replace(0, np.nan).bfill(1).iloc[:, 0]
回答by ayhan
This seems to work:
这似乎有效:
df[df!=0].cumsum(axis=1).min(axis=1)
Out[74]:
0 2.0
1 1.0
2 13.0
dtype: float64
回答by piRSquared
@acushner's answer is better. Just putting this out there.
@acushner 的回答更好。只是把这个放在那里。
use idxmax
and apply
使用idxmax
和apply
m = (df != 0).idxmax(1)
df.T.apply(lambda x: x[m[x.name]])
0 2.0
1 1.0
2 13.0
dtype: float64
This also works:
这也有效:
m = (df != 0).idxmax(1)
t = zip(m.index, m.values)
df.stack().loc[t].reset_index(1, drop=True)
回答by andrew
I'm not sure that I would call this "better". But it returns a series in a one liner.
我不确定我会称之为“更好”。但它在一个班轮中返回一个系列。
df.apply(lambda x: x[np.where(x > 0)[0][0]], axis=1)
>>>
0 2.0
1 1.0
2 13.0
dtype: float64
回答by Victor Vulovic
Here's a very fast way using .apply and .nonzero()
这是使用 .apply 和.nonzero()的一种非常快速的方法
df2.apply(lambda x: x.iloc[x.nonzero()[0][0]], axis=1)
>>>
0 2.0
1 1.0
2 13.0
dtype: float64
performance:
表现:
%%timeit
df2.apply(lambda x: x.iloc[x.nonzero()[0][0]], axis=1)
>>>
190 μs ± 8.18 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)