Python Pandas 数据框 sort_values 不起作用

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39590055/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:03:00  来源:igfitidea点击:

Python pandas dataframe sort_values does not work

pythonpandas

提问by jeffsia

I have the following pandas data frame which I want to sort by 'test_type'

我有以下Pandas数据框,我想按“test_type”排序

  test_type         tps          mtt        mem        cpu       90th
0  sso_1000  205.263559  4139.031090  24.175933  34.817701  4897.4766
1  sso_1500  201.127133  5740.741266  24.599400  34.634209  6864.9820
2  sso_2000  203.204082  6610.437558  24.466267  34.831947  8005.9054
3   sso_500  189.566836  2431.867002  23.559557  35.787484  2869.7670

My code to load the dataframe and sort it is, the first print line prints the data frame above.

我的代码加载数据框并对其进行排序,第一行打印上面的数据框。

        df = pd.read_csv(file) #reads from a csv file
        print df
        df = df.sort_values(by=['test_type'], ascending=True)
        print '\nAfter sort...'
        print df

After doing the sort and printing the dataframe content, the data frame still looks like below.

进行排序并打印数据框内容后,数据框仍然如下所示。

Program output:

程序输出:

After sort...
  test_type         tps          mtt        mem        cpu       90th
0  sso_1000  205.263559  4139.031090  24.175933  34.817701  4897.4766
1  sso_1500  201.127133  5740.741266  24.599400  34.634209  6864.9820
2  sso_2000  203.204082  6610.437558  24.466267  34.831947  8005.9054
3   sso_500  189.566836  2431.867002  23.559557  35.787484  2869.7670

I expect row 3 (test type: sso_500 row) to be on top after sorting. Can someone help me figure why it's not working as it should?

我希望第 3 行(测试类型:sso_500 行)在排序后位于顶部。有人可以帮我弄清楚为什么它不能正常工作吗?

回答by Ami Tavory

Presumbaly, what you're trying to do is sort by the numerical value after sso_. You can do this as follows:

据推测,您要做的是按sso_. 您可以按如下方式执行此操作:

import numpy as np

df.ix[np.argsort(df.test_type.str.split('_').str[-1].astype(int).values)

This

这个

  1. splits the strings at _

  2. converts what's after this character to the numerical value

  3. Finds the indices sorted according to the numerical values

  4. Reorders the DataFrame according to these indices

  1. 将字符串拆分为 _

  2. 将此字符后面的内容转换为数值

  3. 查找根据数值排序的索引

  4. 根据这些索引对 DataFrame 重新排序

Example

例子

In [15]: df = pd.DataFrame({'test_type': ['sso_1000', 'sso_500']})

In [16]: df.sort_values(by=['test_type'], ascending=True)
Out[16]: 
  test_type
0  sso_1000
1   sso_500

In [17]: df.ix[np.argsort(df.test_type.str.split('_').str[-1].astype(int).values)]
Out[17]: 
  test_type
1   sso_500
0  sso_1000

回答by Nickil Maveli

Alternatively, you could also extract the numbers from test_typeand sort them. Followed by reindexing DFaccording to those indices.

或者,您也可以从中提取数字test_type并对其进行排序。然后DF根据这些索引重新索引。

df.reindex(df['test_type'].str.extract('(\d+)', expand=False)    \
                          .astype(int).sort_values().index).reset_index(drop=True)

Image

图片