Pandas 中的就地 sort_values 到底是什么意思?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41776801/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:50:11  来源:igfitidea点击:

In-place sort_values in pandas what does it exactly mean?

pythonsortingpandasin-place

提问by Karel Macek

Maybe a very naive question, but I am stuck in this: pandas.Serieshas a method sort_valuesand there is an option to do it "in place" or not. I have Googled for it a while, but I am not very clear about it. It seems that this thing is assumed to be perfectly known to everybody but me. Could anyone give me some illustrative explanation how these two options differ each other for dummies...?

也许是一个非常幼稚的问题,但我被困在这个问题上:pandas.Series有一种方法,sort_values并且可以选择“就地”或不“就地”进行。我在谷歌上搜索了一段时间,但我不是很清楚。似乎这件事被假定为除了我之外的所有人都非常了解。谁能给我一些说明性的解释,这两个选项对于傻瓜来说是如何不同的......?

Thank you for any assistance.

感谢您提供任何帮助。

采纳答案by Alexey Smirnov

Here an example. df1will hold sorted dataframe and dfwill be intact

这里有一个例子。df1将保存已排序的数据框并且df完好无损

import pandas as pd
from datetime import datetime as dt
df = pd.DataFrame(data=[22,22,3],
                  index=[dt(2016, 11, 10, 0), dt(2016, 11, 10, 13), dt(2016, 11, 13, 5)],
                  columns=['foo'])

df1 = df.sort_values(by='foo')
print(df, df1)

In the case below, dfwill hold sorted values

在下面的情况下,df将保存排序值

import pandas as pd
from datetime import datetime as dt

df = pd.DataFrame(data=[22,22,3],
                  index=[dt(2016, 11, 10, 0), dt(2016, 11, 10, 13), dt(2016, 11, 13, 5)],
                  columns=['foo'])

df.sort_values(by='foo', inplace=True)
print(df)

回答by SSC

As you can read from the sort_values document, the return value of the function is a series. However, it is a new series instead of the original.

sort_values 文档中可以看出,该函数的返回值是一个系列。但是,它是一个新系列而不是原始系列。

For example:

例如:

import numpy as np
import pandas as pd

s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)
a   -0.872271
b    0.294317
c   -0.017433
d   -1.375316
e    0.993197
dtype: float64

s_sorted = s.sort_values()

print(s_sorted)

d   -1.375316
a   -0.872271
c   -0.017433
b    0.294317
e    0.993197
dtype: float64

print(id(s_sorted))
127952880

print(id(s))
127724792

So sand s_sortedare different series. But if you use inplace=True.

所以ss_sorted是不同的系列。但是如果你使用 inplace=True。

s.sort_values(inplace=True)
print(s)
d   -1.375316
a   -0.872271
c   -0.017433
b    0.294317
e    0.993197
dtype: float64

print(id(s))
127724792

It shows they are the same series, and no new series will return.

它表明它们是同一个系列,并且不会返回任何新系列。

回答by Leon

"inplace=True" is more like a physical sort while "inplace=False" is more like logic sort. The physical sort means that the data sets saved in the computer is sorted based on some keys; and the logic sort means the data sets saved in the computer is still saved in the original (when it was input/imported) way, and the sort is only working on the their index. A data sets have one or multiple logic index, but physical index is unique.

“inplace=True”更像是物理排序,而“inplace=False”更像是逻辑排序。物理排序是指计算机中保存的数据集是根据一些key进行排序的;逻辑排序是指计算机中保存的数据集仍以原始(输入/导入时)的方式保存,排序仅对它们的索引起作用。一个数据集有一个或多个逻辑索引,但物理索引是唯一的。