Python 根据列和索引的值对熊猫数据框进行排序?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20277358/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 20:05:01  来源:igfitidea点击:

Sort pandas dataframe both on values of a column and index?

pythonpandassortingdataframe

提问by Blaszard

Is it feasible to sort pandas dataframe by values of a column, but also by index?

按列的值以及索引对熊猫数据框进行排序是否可行?

If you sort a pandas dataframe by values of a column, you can get the resultant dataframe sorted by the column, but unfortunately, you see the order of your dataframe's index messy within the same value of a sorted column.

如果您按列的值对 Pandas 数据框进行排序,则可以获得按列排序的结果数据框,但不幸的是,您会在已排序列的相同值内看到数据框索引的顺序混乱。

So, can I sort a dataframe by a column, such as the column named countbut also sort it by the value of index? And is it also feasible to sort a column by descending order, but whereas sort a index by ascending order?

那么,我可以按列对数据框进行排序,例如命名的列,count但也可以按索引的值对其进行排序吗?按降序对列进行排序是否也可行,而按升序对索引进行排序?

I know how to sort multiple columns in dataframe, and also know I can achieve what I'm asking here by first reset_index()the index and sort it, and then create the index again. But is it more intuitive and efficient way to do it?

我知道如何对数据框中的多列进行排序,也知道我可以通过首先reset_index()索引并对其进行排序来实现我在这里要求的内容,然后再次创建索引。但这是更直观、更有效的方法吗?

回答by cyborg

To sort a column descending, while maintaining the index ascending:

对列进行降序排序,同时保持索引升序:

import pandas as pd
df = pd.DataFrame(index=range(5), data={'c': [4,2,2,4,2]})
df.index = df.index[::-1]
print df.sort(column='c', ascending=False)

Output:

输出:

   c
1  4
4  4
0  2
2  2
3  2

回答by Michael Delgado

You can use a combination of groupby and apply:

您可以使用 groupby 和 apply 的组合:

In [2]: df = pd.DataFrame({
            'transID':  range(8),
            'Location': ['New York','Chicago','New York','New York','Atlanta','Los Angeles',
                            'Chicago','Atlanta'],
            'Sales':    np.random.randint(0,10000,8)}).set_index('transID')
In [3]: df
Out[3]:
        Location    Sales
transID
0       New York    1082
1       Chicago     1664
2       New York    692
3       New York    5669
4       Atlanta     7715
5       Los Angeles 987
6       Chicago     4085
7       Atlanta     2927

In [4]: df.groupby('Location').apply(lambda d: d.sort()).reset_index('Location',drop=True)
Out[4]:
        Location    Sales
transID
4       Atlanta     7715
7       Atlanta     2927
1       Chicago     1664
6       Chicago     4085
5       Los Angeles 987
0       New York    1082
2       New York    692
3       New York    5669

I drop 'Location' at in the last line because groupby inserts the grouped levels into the first positions in the index. Sorting and then dropping them preserves the sorted order.

我在最后一行删除了“位置”,因为 groupby 将分组级别插入索引的第一个位置。排序然后删除它们会保留排序顺序。

回答by fantabolous

In pandas 0.23+ you can do it directly - see OmerB's answer.If you don't yet have 0.23+, read on.

在 pandas 0.23+ 中,您可以直接执行此操作 - 请参阅 OmerB 的回答如果您还没有 0.23+,请继续阅读。



I'd venture that the simplest way is to just copy your index over to a column, and then sort by both.

我敢说,最简单的方法是将索引复制到一列,然后按两者排序。

df['colFromIndex'] = df.index
df = df.sort(['count', 'colFromIndex'])

I'd also prefer to be able to just do something like df.sort(['count', 'index']), but of course that doesn't work.

我也更喜欢能够做类似的事情df.sort(['count', 'index']),但这当然行不通。

回答by romandos

I solve this problem next way:

我接下来解决这个问题:

df.to_csv('df.csv', index = False)
df = df.read_csv('df.csv')

回答by kekert

I believe that the original order from sort_valuesis kept even when sort_indexis applied, so this should work:

我相信sort_values即使在sort_index应用时也会保留原始订单,所以这应该有效:

df.sort_values('count', ascending=False).sort_index(level=[index_level1, index_level2])

回答by Scott Boston

You can use the ascending parameter in sort_index, but you must pass it as a list for it to work correctly as of pandas 0.22.0.

您可以在 sort_index 中使用升序参数,但您必须将其作为列表传递,以便从 pandas 0.22.0 开始正常工作。

import pandas as pd
import numpy as np
df = pd.DataFrame({'idx_0':[2]*6+[1]*5,
                   'idx_1':[6,4,2,10,18,5,11,1,7,9,3],
                   'value_1':np.arange(11,0,-1),
                   'MyName':list('SORTEDFRAME')})

df = df.set_index(['idx_0','idx_1'])
df

Output:

输出:

            MyName  value_1
idx_0 idx_1                
2     6          S       11
      4          O       10
      2          R        9
      10         T        8
      18         E        7
      5          D        6
1     11         F        5
      1          R        4
      7          A        3
      9          M        2
      3          E        1

Sorting by values and index should get "FRAMESORTED" instead of "SORTEDFRAME"

按值和索引排序应该得到“FRAMESORTED”而不是“SORTEDFRAME”

df.sort_values('value_1', ascending=False)\
  .sort_index(level=0, ascending=[True])

Output:

输出:

            MyName  value_1
idx_0 idx_1                
1     11         F        5
      1          R        4
      7          A        3
      9          M        2
      3          E        1
2     6          S       11
      4          O       10
      2          R        9
      10         T        8
      18         E        7
      5          D        6

Note you must pass ascendingparameter in sort_indexas a list and not as a scalar. It will not work.

请注意,您必须将ascending参数sort_index作为列表而不是标量传入。不起作用。

回答by Iain D

As of pandas version 0.22.

从熊猫版本 0.22 开始。

You can temporarily set the column as an index, sort the index on that column and then reset. By default it will maintain the order of the existing index:

您可以暂时将该列设置为索引,对该列的索引进行排序,然后重置。默认情况下,它将保持现有索引的顺序:

df = df.set_index('column_name', append=True).sort_index(level=1).reset_index(level=1)

I think the above could be done with 'inplace' options but I think it's easier to read as above.

我认为上述内容可以通过“就地”选项完成,但我认为如上所述更容易阅读。

回答by OmerB

Pandas 0.23finally gets you there :-D

Pandas 0.23终于让你到达那里 :-D

You can now pass index names (and not only column names) as parameters to sort_values. So, this one-liner works:

您现在可以将索引名称(不仅是列名称)作为参数传递给sort_values. 所以,这个单行工作:

df = df.sort_values(by = ['MyCol', 'MyIdx'], ascending = [False, True])

And if your index is currently unnamed:

如果您的索引当前未命名:

df = df.rename_axis('MyIdx').sort_values(by = ['MyCol', 'MyIdx'], ascending = [False, True])