Pandas 第二大值的列名

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26015489/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:29:57  来源:igfitidea点击:

Pandas second largest value's column name

pandasdataframe

提问by AtotheSiv

I am trying to find column name associated with the largest and second largest values in a DataFrame, here's a simplified example (the real one has over 500 columns):

我试图在 DataFrame 中找到与最大和第二大值相关联的列名,这是一个简化的例子(真实的有 500 多列):

Date  val1  val2 val3 val4
1990   5     7    1    10
1991   2     1    10   3
1992   10    9    6    1
1993   50    10   2    15
1994   1     15   7    8

Needs to become:

需要变成:

Date  1larg   2larg
1990  val4    val2
1991  val3    val4
1992  val1    val2
1993  val1    val4
1994  val2    val4

I can find the column name with the largest value (i,e, 1larg above) with idxmax, but how can I find the second largest?

我可以使用 idxmax 找到具有最大值(即上面的 1larg)的列名,但是如何找到第二大的列名?

回答by DSM

(You don't have any duplicate maximum values in your rows, so I'll guess that if you have [1,1,2,2]you want val3and val4to be selected.)

(您的行中没有任何重复的最大值,所以我猜如果[1,1,2,2]您想要val3并被val4选中。)

One way would be to use the result of argsortas an index into a Series with the column names.

一种方法是将结果argsort用作具有列名的系列的索引。

df = df.set_index("Date")
arank = df.apply(np.argsort, axis=1)
ranked_cols = df.columns.to_series()[arank.values[:,::-1][:,:2]]
new_frame = pd.DataFrame(ranked_cols, index=df.index)

produces

产生

         0     1
Date            
1990  val4  val2
1991  val3  val4
1992  val1  val2
1993  val1  val4
1994  val2  val4
1995  val4  val3

(where I've added an extra 1995 [1,1,2,2]row.)

(我在其中添加了额外的 1995[1,1,2,2]行。)

Alternatively, you could probably meltinto a flat format, pick out the largest two values in each Date group, and then turn it again.

或者,您可能会melt转换为平面格式,在每个日期组中挑选出最大的两个值,然后再次将其转换。