Pandas 中的就地 sort_values 到底是什么意思？

Question

提问by Karel Macek

Maybe a very naive question, but I am stuck in this: pandas.Serieshas a method sort_valuesand there is an option to do it "in place" or not. I have Googled for it a while, but I am not very clear about it. It seems that this thing is assumed to be perfectly known to everybody but me. Could anyone give me some illustrative explanation how these two options differ each other for dummies...?

也许是一个非常幼稚的问题，但我被困在这个问题上：pandas.Series有一种方法，sort_values并且可以选择“就地”或不“就地”进行。我在谷歌上搜索了一段时间，但我不是很清楚。似乎这件事被假定为除了我之外的所有人都非常了解。谁能给我一些说明性的解释，这两个选项对于傻瓜来说是如何不同的......？

Thank you for any assistance.

感谢您提供任何帮助。

Answer 1

采纳答案by Alexey Smirnov

Here an example. df1will hold sorted dataframe and dfwill be intact

这里有一个例子。df1将保存已排序的数据框并且df完好无损

import pandas as pd
from datetime import datetime as dt
df = pd.DataFrame(data=[22,22,3],
                  index=[dt(2016, 11, 10, 0), dt(2016, 11, 10, 13), dt(2016, 11, 13, 5)],
                  columns=['foo'])

df1 = df.sort_values(by='foo')
print(df, df1)

In the case below, dfwill hold sorted values

在下面的情况下，df将保存排序值

import pandas as pd
from datetime import datetime as dt

df = pd.DataFrame(data=[22,22,3],
                  index=[dt(2016, 11, 10, 0), dt(2016, 11, 10, 13), dt(2016, 11, 13, 5)],
                  columns=['foo'])

df.sort_values(by='foo', inplace=True)
print(df)

Answer 2

回答by SSC

As you can read from the sort_values document, the return value of the function is a series. However, it is a new series instead of the original.

从sort_values 文档中可以看出，该函数的返回值是一个系列。但是，它是一个新系列而不是原始系列。

For example:

例如：

import numpy as np
import pandas as pd

s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)
a   -0.872271
b    0.294317
c   -0.017433
d   -1.375316
e    0.993197
dtype: float64

s_sorted = s.sort_values()

print(s_sorted)

d   -1.375316
a   -0.872271
c   -0.017433
b    0.294317
e    0.993197
dtype: float64

print(id(s_sorted))
127952880

print(id(s))
127724792

So sand s_sortedare different series. But if you use inplace=True.

所以s和s_sorted是不同的系列。但是如果你使用 inplace=True。

s.sort_values(inplace=True)
print(s)
d   -1.375316
a   -0.872271
c   -0.017433
b    0.294317
e    0.993197
dtype: float64

print(id(s))
127724792

It shows they are the same series, and no new series will return.

它表明它们是同一个系列，并且不会返回任何新系列。

Answer 3

回答by Leon

"inplace=True" is more like a physical sort while "inplace=False" is more like logic sort. The physical sort means that the data sets saved in the computer is sorted based on some keys; and the logic sort means the data sets saved in the computer is still saved in the original (when it was input/imported) way, and the sort is only working on the their index. A data sets have one or multiple logic index, but physical index is unique.

“inplace=True”更像是物理排序，而“inplace=False”更像是逻辑排序。物理排序是指计算机中保存的数据集是根据一些key进行排序的；逻辑排序是指计算机中保存的数据集仍以原始（输入/导入时）的方式保存，排序仅对它们的索引起作用。一个数据集有一个或多个逻辑索引，但物理索引是唯一的。

Pandas 中的就地 sort_values 到底是什么意思？

提问by Karel Macek

采纳答案by Alexey Smirnov

回答by SSC

回答by Leon

相关推荐

最近更新

标签

Pandas 中的就地 sort_values 到底是什么意思？

提问by Karel Macek

采纳答案by Alexey Smirnov

回答by SSC

回答by Leon

相关推荐

如何在 Python 中使用 Pandas 重命名 DataFrame 中的列

pandas 如何将 Python 字典转换为 html 表？

使用 pandas/python 合并/合并两个 csv

如何有效地将 pos_tag_sents() 应用于 Pandas 数据帧

相关推荐

最近更新

标签