Python 如何使用来自多列的值对熊猫数据框进行排序?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17618981/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to sort pandas data frame using values from several columns?
提问by Roman
I have the following data frame:
我有以下数据框:
df = pandas.DataFrame([{'c1':3,'c2':10},{'c1':2, 'c2':30},{'c1':1,'c2':20},{'c1':2,'c2':15},{'c1':2,'c2':100}])
Or, in human readable form:
或者,以人类可读的形式:
c1 c2
0 3 10
1 2 30
2 1 20
3 2 15
4 2 100
The following sorting-command works as expected:
以下排序命令按预期工作:
df.sort(['c1','c2'], ascending=False)
Output:
输出:
c1 c2
0 3 10
4 2 100
1 2 30
3 2 15
2 1 20
But the following command:
但是下面的命令:
df.sort(['c1','c2'], ascending=[False,True])
results in
结果是
c1 c2
2 1 20
3 2 15
1 2 30
4 2 100
0 3 10
and this is not what I expect. I expect to have the values in the first column ordered from largest to smallest, and if there are identical values in the first column, order by the ascending values from the second column.
这不是我所期望的。我希望第一列中的值从大到小排序,如果第一列中有相同的值,则按第二列中的升序值排序。
Does anybody know why it does not work as expected?
有人知道为什么它不能按预期工作吗?
ADDED
添加
This is copy-paste:
这是复制粘贴:
>>> df.sort(['c1','c2'], ascending=[False,True])
c1 c2
2 1 20
3 2 15
1 2 30
4 2 100
0 3 10
回答by falsetru
DataFrame.sort
is deprecated; use DataFrame.sort_values
.
DataFrame.sort
已弃用;使用DataFrame.sort_values
.
>>> df.sort_values(['c1','c2'], ascending=[False,True])
c1 c2
0 3 10
3 2 15
1 2 30
4 2 100
2 1 20
>>> df.sort(['c1','c2'], ascending=[False,True])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ampawake/anaconda/envs/pseudo/lib/python2.7/site-packages/pandas/core/generic.py", line 3614, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'sort'
回答by Akash
If you are writing this code as a script file then you will have to write it like this:
如果您将此代码编写为脚本文件,则必须这样编写:
df = df.sort(['c1','c2'], ascending=[False,True])
回答by HonzaB
回答by miraculixx
I have found this to be really useful:
我发现这非常有用:
df = pd.DataFrame({'A' : range(0,10) * 2, 'B' : np.random.randint(20,30,20)})
# A ascending, B descending
df.sort(**skw(columns=['A','-B']))
# A descending, B ascending
df.sort(**skw(columns=['-A','+B']))
Note that unlike the standard columns=,ascending=
arguments, here column names and their sort order are in the same place. As a result your code gets a lot easier to read and maintain.
请注意,与标准columns=,ascending=
参数不同,此处的列名及其排序顺序位于同一位置。因此,您的代码更易于阅读和维护。
Note the actual call to .sort
is unchanged, skw
(sortkwargs) is just a small helper function that parses the columns and returns the usual columns=
and ascending=
parameters for you. Pass it any other sort kwargs as you usually would. Copy/paste the following code into e.g. your local utils.py
then forget about it and just use it as above.
请注意,对 的实际调用.sort
没有改变,skw
( sort kwargs) 只是一个小的辅助函数,它解析列并为您返回常用参数columns=
和ascending=
参数。像往常一样将任何其他类型的 kwarg 传递给它。将以下代码复制/粘贴到例如您的本地代码中,utils.py
然后忘记它并按上述方式使用它。
# utils.py (or anywhere else convenient to import)
def skw(columns=None, **kwargs):
""" get sort kwargs by parsing sort order given in column name """
# set default order as ascending (+)
sort_cols = ['+' + col if col[0] != '-' else col for col in columns]
# get sort kwargs
columns, ascending = zip(*[(col.replace('+', '').replace('-', ''),
False if col[0] == '-' else True)
for col in sort_cols])
kwargs.update(dict(columns=list(columns), ascending=ascending))
return kwargs
回答by fotis j
The dataframe.sort() method is - so my understanding - deprecated in pandas > 0.18. In order to solve your problem you should use dataframe.sort_values() instead:
dataframe.sort() 方法 - 所以我的理解 - 在 pandas > 0.18 中被弃用。为了解决您的问题,您应该使用 dataframe.sort_values() 代替:
f.sort_values(by=["c1","c2"], ascending=[False, True])
The output looks like this:
输出如下所示:
c1 c2
3 10
2 15
2 30
2 100
1 20
回答by CONvid19
In my case, the accepted answer didn't work:
就我而言,接受的答案不起作用:
f.sort_values(by=["c1","c2"], ascending=[False, True])
f.sort_values(by=["c1","c2"], 升序=[假,真])
Only the following worked as expected:
只有以下按预期工作:
f = f.sort_values(by=["c1","c2"], ascending=[False, True])