在 Pandas DataFrame 的每一列中查找第一个非零值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50586146/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find first non-zero value in each column of pandas DataFrame
提问by Konstantin
What is a pandoric way to get a value and index of the first non-zero element in each column of a DataFrame (top to bottom)?
在 DataFrame 的每一列(从上到下)中获取第一个非零元素的值和索引的 pandoric 方法是什么?
import pandas as pd
df = pd.DataFrame([[0, 0, 0],
[0, 10, 0],
[4, 0, 0],
[1, 2, 3]],
columns=['first', 'second', 'third'])
print(df.head())
# first second third
# 0 0 0 0
# 1 0 10 0
# 2 4 0 0
# 3 1 2 3
What I would like to achieve:
我想达到的目标:
# value pos
# first 4 2
# second 10 1
# third 1 3
采纳答案by piRSquared
You're looking for idxmax
which gives you the first position of the maximum. However, you need to find the max of "not equal to zero"
您正在寻找idxmax
哪个可以为您提供最大值的第一个位置。但是,您需要找到“不等于零”的最大值
df.ne(0).idxmax()
first 2
second 1
third 3
dtype: int64
We can couple this with lookup
and assign
df.ne(0).idxmax().to_frame('pos').assign(val=lambda d: df.lookup(d.pos, d.index))
pos val
first 2 4
second 1 10
third 3 3
Same answer packaged slightly differently.
相同的答案包装略有不同。
m = df.ne(0).idxmax()
pd.DataFrame(dict(pos=m, val=df.lookup(m, m.index)))
pos val
first 2 4
second 1 10
third 3 3
回答by jpp
Here's the longwinded way, which should be faster if your non-zero values tend to occur near the start of large arrays:
这是冗长的方式,如果您的非零值倾向于出现在大型数组的开头附近,则应该更快:
import pandas as pd
df = pd.DataFrame([[0, 0, 0],[0, 10, 0],[4, 0, 0],[1, 2, 3]],
columns=['first', 'second', 'third'])
res = [next(((j, i) for i, j in enumerate(df[col]) if j != 0), (0, 0)) for col in df]
df_res = pd.DataFrame(res, columns=['value', 'position'], index=df.columns)
print(df_res)
value position
first 4 2
second 10 1
third 3 3
回答by YOBEN_S
I will using stack
, index is for row and column number
我将使用stack
,索引用于行号和列号
df[df.eq(df.max(1),0)&df.ne(0)].stack()
Out[252]:
1 second 10.0
2 first 4.0
3 third 3.0
dtype: float64