在 Pandas DataFrame 的每一列中查找第一个非零值

Question

提问by Konstantin

What is a pandoric way to get a value and index of the first non-zero element in each column of a DataFrame (top to bottom)?

在 DataFrame 的每一列（从上到下）中获取第一个非零元素的值和索引的 pandoric 方法是什么？

import pandas as pd

df = pd.DataFrame([[0, 0, 0],
                   [0, 10, 0],
                   [4, 0, 0],
                   [1, 2, 3]],
                  columns=['first', 'second', 'third'])

print(df.head())

#    first  second  third
# 0      0       0      0
# 1      0      10      0
# 2      4       0      0
# 3      1       2      3

What I would like to achieve:

我想达到的目标：

#        value  pos
# first      4    2
# second    10    1
# third      1    3

Answer 1

采纳答案by piRSquared

You're looking for idxmaxwhich gives you the first position of the maximum. However, you need to find the max of "not equal to zero"

您正在寻找idxmax哪个可以为您提供最大值的第一个位置。但是，您需要找到“不等于零”的最大值

df.ne(0).idxmax()

first     2
second    1
third     3
dtype: int64

We can couple this with lookupand assign

我们可以将它与lookup和assign

df.ne(0).idxmax().to_frame('pos').assign(val=lambda d: df.lookup(d.pos, d.index))

        pos  val
first     2    4
second    1   10
third     3    3

Same answer packaged slightly differently.

相同的答案包装略有不同。

m = df.ne(0).idxmax()
pd.DataFrame(dict(pos=m, val=df.lookup(m, m.index)))

        pos  val
first     2    4
second    1   10
third     3    3

Answer 2

回答by jpp

Here's the longwinded way, which should be faster if your non-zero values tend to occur near the start of large arrays:

这是冗长的方式，如果您的非零值倾向于出现在大型数组的开头附近，则应该更快：

import pandas as pd

df = pd.DataFrame([[0, 0, 0],[0, 10, 0],[4, 0, 0],[1, 2, 3]],
                  columns=['first', 'second', 'third'])

res = [next(((j, i) for i, j in enumerate(df[col]) if j != 0), (0, 0)) for col in df]

df_res = pd.DataFrame(res, columns=['value', 'position'], index=df.columns)

print(df_res)

        value  position
first       4         2
second     10         1
third       3         3

Answer 3

回答by YOBEN_S

I will using stack, index is for row and column number

我将使用stack，索引用于行号和列号

df[df.eq(df.max(1),0)&df.ne(0)].stack()
Out[252]: 
1  second    10.0
2  first      4.0
3  third      3.0
dtype: float64

在 Pandas DataFrame 的每一列中查找第一个非零值

提问by Konstantin

采纳答案by piRSquared

回答by jpp

回答by YOBEN_S

相关推荐

最近更新

标签

在 Pandas DataFrame 的每一列中查找第一个非零值

提问by Konstantin

采纳答案by piRSquared

回答by jpp

回答by YOBEN_S

相关推荐

pandas Python：从数据透视表熊猫数据框创建条形图

如何将日期时间格式转换为分钟 - pandas

Pandas：查找特定列不是 NA 但所有其他列的行

pandas Python 错误：TypeError：'Timestamp' 类型的对象不是 JSON 可序列化的'

相关推荐

最近更新

标签