使用对列值的函数对 Pandas DataFrame 进行排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38662826/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sort pandas DataFrame with function over column values
提问by Ohumeronen
Based on python, sort descending dataframe with pandas:
Given:
鉴于:
from pandas import DataFrame
import pandas as pd
d = {'x':[2,3,1,4,5],
'y':[5,4,3,2,1],
'letter':['a','a','b','b','c']}
df = DataFrame(d)
df then looks like this:
df 然后看起来像这样:
df:
letter x y
0 a 2 5
1 a 3 4
2 b 1 3
3 b 4 2
4 c 5 1
I would like to have something like:
我想要一些类似的东西:
f = lambda x,y: x**2 + y**2
test = df.sort(f('x', 'y'))
This should order the complete dataframe with respect to the sum of the squared values of column 'x' and 'y' and give me:
这应该根据列 'x' 和 'y' 的平方值的总和对完整的数据框进行排序,并给我:
test:
letter x y
2 b 1 3
3 b 4 2
1 a 3 4
4 c 5 1
0 a 2 5
Ascending or descending order does not matter. Is there a nice and simple way to do that? I could not yet find a solution.
升序或降序无关紧要。有没有一种很好且简单的方法来做到这一点?我还没有找到解决办法。
采纳答案by andrewkittredge
df.iloc[(df.x ** 2 + df.y **2).sort_values().index]
after How to sort pandas dataframe by custom order on string index
回答by ayhan
You can create a temporary column to use in sort and then drop it:
您可以创建一个临时列以用于排序,然后将其删除:
df.assign(f = df['one']**2 + df['two']**2).sort_values('f').drop('f', axis=1)
Out:
letter one two
2 b 1 3
3 b 4 2
1 a 3 4
4 c 5 1
0 a 2 5
回答by Sandeep
Have you tried to create a new column and then sorting on that. I cannot comment on the original post, so i am just posting my solution.
您是否尝试过创建一个新列,然后对其进行排序。我无法对原始帖子发表评论,所以我只是发布了我的解决方案。
df['c'] = df.a**2 + df.b**2
df = df.sort_values('c')
回答by Adam Warner
from pandas import DataFrame
import pandas as pd
d = {'one':[2,3,1,4,5],
'two':[5,4,3,2,1],
'letter':['a','a','b','b','c']}
df = pd.DataFrame(d)
#f = lambda x,y: x**2 + y**2
array = []
for i in range(5):
array.append(df.ix[i,1]**2 + df.ix[i,2]**2)
array = pd.DataFrame(array, columns = ['Sum of Squares'])
test = pd.concat([df,array],axis = 1, join = 'inner')
test = test.sort_index(by = "Sum of Squares", ascending = True).drop('Sum of Squares',axis =1)
Just realized that you wanted this:
刚刚意识到你想要这个:
letter one two
2 b 1 3
3 b 4 2
1 a 3 4
4 c 5 1
0 a 2 5