Pandas 中的就地 sort_values 到底是什么意思?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41776801/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
In-place sort_values in pandas what does it exactly mean?
提问by Karel Macek
Maybe a very naive question, but I am stuck in this: pandas.Series
has a method sort_values
and there is an option to do it "in place" or not. I have Googled for it a while, but I am not very clear about it. It seems that this thing is assumed to be perfectly known to everybody but me. Could anyone give me some illustrative explanation how these two options differ each other for dummies...?
也许是一个非常幼稚的问题,但我被困在这个问题上:pandas.Series
有一种方法,sort_values
并且可以选择“就地”或不“就地”进行。我在谷歌上搜索了一段时间,但我不是很清楚。似乎这件事被假定为除了我之外的所有人都非常了解。谁能给我一些说明性的解释,这两个选项对于傻瓜来说是如何不同的......?
Thank you for any assistance.
感谢您提供任何帮助。
采纳答案by Alexey Smirnov
Here an example. df1
will hold sorted dataframe and df
will be intact
这里有一个例子。df1
将保存已排序的数据框并且df
完好无损
import pandas as pd
from datetime import datetime as dt
df = pd.DataFrame(data=[22,22,3],
index=[dt(2016, 11, 10, 0), dt(2016, 11, 10, 13), dt(2016, 11, 13, 5)],
columns=['foo'])
df1 = df.sort_values(by='foo')
print(df, df1)
In the case below, df
will hold sorted values
在下面的情况下,df
将保存排序值
import pandas as pd
from datetime import datetime as dt
df = pd.DataFrame(data=[22,22,3],
index=[dt(2016, 11, 10, 0), dt(2016, 11, 10, 13), dt(2016, 11, 13, 5)],
columns=['foo'])
df.sort_values(by='foo', inplace=True)
print(df)
回答by SSC
As you can read from the sort_values document, the return value of the function is a series. However, it is a new series instead of the original.
从sort_values 文档中可以看出,该函数的返回值是一个系列。但是,它是一个新系列而不是原始系列。
For example:
例如:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)
a -0.872271
b 0.294317
c -0.017433
d -1.375316
e 0.993197
dtype: float64
s_sorted = s.sort_values()
print(s_sorted)
d -1.375316
a -0.872271
c -0.017433
b 0.294317
e 0.993197
dtype: float64
print(id(s_sorted))
127952880
print(id(s))
127724792
So s
and s_sorted
are different series.
But if you use inplace=True.
所以s
和s_sorted
是不同的系列。但是如果你使用 inplace=True。
s.sort_values(inplace=True)
print(s)
d -1.375316
a -0.872271
c -0.017433
b 0.294317
e 0.993197
dtype: float64
print(id(s))
127724792
It shows they are the same series, and no new series will return.
它表明它们是同一个系列,并且不会返回任何新系列。
回答by Leon
"inplace=True" is more like a physical sort while "inplace=False" is more like logic sort. The physical sort means that the data sets saved in the computer is sorted based on some keys; and the logic sort means the data sets saved in the computer is still saved in the original (when it was input/imported) way, and the sort is only working on the their index. A data sets have one or multiple logic index, but physical index is unique.
“inplace=True”更像是物理排序,而“inplace=False”更像是逻辑排序。物理排序是指计算机中保存的数据集是根据一些key进行排序的;逻辑排序是指计算机中保存的数据集仍以原始(输入/导入时)的方式保存,排序仅对它们的索引起作用。一个数据集有一个或多个逻辑索引,但物理索引是唯一的。