在每个 Pandas 数据框行中查找前 n 个最高值列的名称
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38955182/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find names of top-n highest-value columns in each pandas dataframe row
提问by chessosapiens
I have the following dataframe:
我有以下数据框:
id p1 p2 p3 p4
1 0 9 1 4
2 0 2 3 4
3 1 3 10 7
4 1 5 3 1
5 2 3 7 10
I need to reshape the data frame in a way that for each id it will have the top 3 columns with the highest values. The result would be like this:
我需要以一种方式重塑数据框,对于每个 id,它将具有最高值的前 3 列。结果是这样的:
id top1 top2 top3
1 p2 p4 p3
2 p4 p3 p2
3 p3 p4 p2
4 p2 p3 p4/p1
5 p4 p3 p2
It shows the top 3 best sellers for every user_id
. I have already done it using the dplyr
package in R, but I am looking for the pandas equivalent.
它显示了每个user_id
. 我已经使用dplyr
R 中的包完成了它,但我正在寻找 Pandas 等效项。
回答by unutbu
You could use np.argsort
to find the indices of the nlargest items for each row:
您可以使用np.argsort
来查找每行的n 个最大项目的索引:
import numpy as np
import pandas as pd
df = pd.DataFrame({'id': [1, 2, 3, 4, 5],
'p1': [0, 0, 1, 1, 2],
'p2': [9, 2, 3, 5, 3],
'p3': [1, 3, 10, 3, 7],
'p4': [4, 4, 7, 1, 10]})
df = df.set_index('id')
nlargest = 3
order = np.argsort(-df.values, axis=1)[:, :nlargest]
result = pd.DataFrame(df.columns[order],
columns=['top{}'.format(i) for i in range(1, nlargest+1)],
index=df.index)
print(result)
yields
产量
top1 top2 top3
id
1 p2 p4 p3
2 p4 p3 p2
3 p3 p4 p2
4 p2 p3 p1
5 p4 p3 p2
回答by jezrael
You can use:
您可以使用:
df = df.set_index('id').apply(lambda x: pd.Series(x.sort_values(ascending=False)
.iloc[:3].index,
index=['top1','top2','top3']), axis=1).reset_index()
print (df)
id top1 top2 top3
0 1 p2 p4 p3
1 2 p4 p3 p2
2 3 p3 p4 p2
3 4 p2 p3 p4
4 5 p4 p3 p2