Python 查找每行具有最大值的列名

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29919306/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 05:10:04  来源:igfitidea点击:

Find the column name which has the maximum value for each row

pythonpandasdataframemax

提问by markov zain

I have a DataFrame like this one:

我有一个像这样的 DataFrame:

In [7]:
frame.head()
Out[7]:
Communications and Search   Business    General Lifestyle
0   0.745763    0.050847    0.118644    0.084746
0   0.333333    0.000000    0.583333    0.083333
0   0.617021    0.042553    0.297872    0.042553
0   0.435897    0.000000    0.410256    0.153846
0   0.358974    0.076923    0.410256    0.153846

In here, I want to ask how to get column name which has maximum value for each row, the desired output is like this:

在这里,我想问一下如何获取每行具有最大值的列名,所需的输出是这样的:

In [7]:
    frame.head()
    Out[7]:
    Communications and Search   Business    General Lifestyle   Max
    0   0.745763    0.050847    0.118644    0.084746           Communications 
    0   0.333333    0.000000    0.583333    0.083333           Business  
    0   0.617021    0.042553    0.297872    0.042553           Communications 
    0   0.435897    0.000000    0.410256    0.153846           Communications 
    0   0.358974    0.076923    0.410256    0.153846           Business 

采纳答案by Alex Riley

You can use idxmaxwith axis=1to find the column with the greatest value on each row:

您可以使用idxmaxwithaxis=1查找每行中具有最大值的列:

>>> df.idxmax(axis=1)
0    Communications
1          Business
2    Communications
3    Communications
4          Business
dtype: object

To create the new column 'Max', use df['Max'] = df.idxmax(axis=1).

要创建新列“Max”,请使用df['Max'] = df.idxmax(axis=1).

To find the rowindex at which the maximum value occurs in each column, use df.idxmax()(or equivalently df.idxmax(axis=0)).

要查找每列中出现最大值的索引,请使用df.idxmax()(或等效地df.idxmax(axis=0))。

回答by Zero

You could applyon dataframe and get argmax()of each row via axis=1

您可以apply在数据帧上argmax()通过axis=1

In [144]: df.apply(lambda x: x.argmax(), axis=1)
Out[144]:
0    Communications
1          Business
2    Communications
3    Communications
4          Business
dtype: object


Here's a benchmark to compare how slow applymethod is to idxmax()for len(df) ~ 20K

这里有一个基准来比较慢apply的方法是idxmax()len(df) ~ 20K

In [146]: %timeit df.apply(lambda x: x.argmax(), axis=1)
1 loops, best of 3: 479 ms per loop

In [147]: %timeit df.idxmax(axis=1)
10 loops, best of 3: 47.3 ms per loop

回答by user1718097

And if you want to produce a column containing the name of the column with the maximum value but considering only a subset of columns then you use a variation of @ajcr's answer:

如果您想生成一个包含具有最大值的列的名称但只考虑列的子集的列,那么您可以使用@ajcr 答案的变体:

df['Max'] = df[['Communications','Business']].idxmax(axis=1)