Python 理解就地=真

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43893457/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:29:57  来源:igfitidea点击:

Understanding inplace=True

pythonpandasin-place

提问by Aran Freel

In the pandaslibrary many times there is an option to change the object inplace such as with the following statement...

pandas库中,很多时候都有一个选项可以就地更改对象,例如使用以下语句...

df.dropna(axis='index', how='all', inplace=True)

I am curious what is being returned as well as how the object is handled when inplace=Trueis passed vs. when inplace=False.

我很好奇返回的内容以及对象在inplace=True传递时与传递时的处理方式inplace=False

Are all operations modifying selfwhen inplace=True? And when inplace=Falseis a new object created immediately such as new_df = selfand then new_dfis returned?

所有操作都self在何时修改inplace=True?什么时候inplace=False立即创建一个新对象,例如new_df = self然后new_df返回?

回答by ECH

When inplace=Trueis passed, the data is renamed in place (it returns nothing), so you'd use:

inplace=True传递时,数据被重命名(它不返回任何内容),因此您可以使用:

df.an_operation(inplace=True)

When inplace=Falseis passed (this is the default value, so isn't necessary), performs the operation and returns a copy of the object, so you'd use:

inplace=False传递(这是默认值,所以没有必要),执行操作,并返回该对象的副本,所以你会使用:

df = df.an_operation(inplace=False) 

回答by Nabin

The way I use it is

我使用它的方式是

# Have to assign back to dataframe (because it is a new copy)
df = df.some_operation(inplace=False) 

Or

或者

# No need to assign back to dataframe (because it is on the same copy)
df.some_operation(inplace=True)

CONCLUSION:

结论:

 if inplace is False
      Assign to a new variable;
 else
      No need to assign

回答by cs95

Don't use inplace=True!

不要用inplace=True

TLDR;

TLDR;

  • inplace, contrary to what the name implies, often does not prevent copies from being created, and (almost) never offers any performance benefits
  • inplacedoes not work with method chaining
  • inplaceis a common pitfall for beginners, so removing this option will simplify the API
  • inplace,与名称所暗示的相反,通常不会阻止创建副本,并且(几乎)从不提供任何性能优势
  • inplace不适用于方法链
  • inplace对于初学者来说是一个常见的陷阱,因此删除此选项将简化 API


I don't advise setting this parameter as it serves little purpose. See this GitHub issuewhich proposes the inplaceargument be deprecated api-wide.

我不建议设置这个参数,因为它没什么用。请参阅此 GitHub 问题该问题建议在inplaceapi-wide 范围内弃用该参数。

It is a common misconception that using inplace=Truewill lead to more efficient or optimized code. In reality, there are absolutely no performance benefitsto using inplace=True. Both the in-place and out-of-place versions create a copy of the data anyway, with the in-place version automatically assigning the copy back.

一个常见的误解是使用inplace=True将导致更高效或优化的代码。在现实中,也有完全没有性能优势使用inplace=True。就地和非就地版本无论如何都会创建数据的副本,就地版本会自动将副本分配回来。

inplace=Truealso hinders method chaining. Contrast the working of

inplace=True也会阻碍方法链接。对比工作

result = df.some_function1().reset_index().some_function2()

As opposed to

temp = df.some_function1()
temp.reset_index(inplace=True)
result = temp.some_function2()

One final caveat to keep in mind is that calling inplace=Trueis a pitfall for beginners. For example, it can trigger the SettingWithCopyWarning:

要记住的最后一个警告是,跟注inplace=True对于初学者来说是一个陷阱。例如,它可以触发SettingWithCopyWarning

df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})

df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning: 
# A value is trying to be set on a copy of a slice from a DataFrame

Which can cause unexpected behavior. Use with caution!

这可能会导致意外行为。谨慎使用!

Another pitfall is using inplace=Truewhen calling a function on a DataFrame column may or may not work. This is especially true when chained indexing is involved.

inplace=True在 DataFrame 列上调用函数时使用的另一个陷阱可能有效也可能无效。当涉及链式索引时尤其如此。

回答by hyukkyulee

Save it to the same variable

将其保存到相同的变量

data["column01"].where(data["column01"]< 5, inplace=True)

data["column01"].where(data["column01"]< 5, inplace=True)

Save it to a separate variable

将其保存到单独的变量

data["column02"] = data["column01"].where(data["column1"]< 5)

data["column02"] = data["column01"].where(data["column1"]< 5)

But, you can always overwrite the variable

但是,您始终可以覆盖变量

data["column01"] = data["column01"].where(data["column1"]< 5)

data["column01"] = data["column01"].where(data["column1"]< 5)

FYI: In default inplace = False

仅供参考:默认情况下 inplace = False

回答by Geeocode

The inplaceparameter:

inplace参数:

df.dropna(axis='index', how='all', inplace=True)

in Pandasand in general means:

in Pandasand in 一般是指:

1.Pandas creates a copy of the original data

1.Pandas 创建原始数据的副本

2.... does some computation on it

2.... 对其进行一些计算

3.... assigns the results to the original data.

3.... 将结果分配给原始数据。

4.... deletes the copy.

4....删除副本。

As you can read in the rest of my answer's further below, we still canhave good reason to use this parameter i.e. the inplace operations, but we should avoid it if we can, as it generate more issues, as:

正如你在我的答案其余阅读下面的进一步,我们还可以有充分的理由来使用此参数即inplace operations,但如果能,我们应该避免,因为它产生更多的问题,如:

1.Your code will be harder to debug (Actually SettingwithCopyWarningstands for warning you to this possible problem)

1.你的代码将更难调试(实际上SettingwithCopyWarning代表警告你这个可能的问题)

2.Conflict with method chaining

2.与方法链冲突



So there is even case when we should use it yet?

那么甚至还有什么时候我们应该使用它呢?

Definitely yes.If we use pandas or any tool for handeling huge dataset, we can easily face the situation, where some big data can consume our entire memory. To avoid this unwanted effect we can use some technics like method chaining:

肯定是的。如果我们使用pandas或任何工具来处理庞大的数据集,我们很容易面临这样的情况,一些大数据会消耗我们的整个内存。为了避免这种不必要的影响,我们可以使用一些技术,如方法链

(
    wine.rename(columns={"color_intensity": "ci"})
    .assign(color_filter=lambda x: np.where((x.hue > 1) & (x.ci > 7), 1, 0))
    .query("alcohol > 14 and color_filter == 1")
    .sort_values("alcohol", ascending=False)
    .reset_index(drop=True)
    .loc[:, ["alcohol", "ci", "hue"]]
)

which make our code more compact (though harder to interpret and debug too) and consumes less memory as the chained methods works with the other method's returned values, thus resulting in only one copyof the input data. We can see clearly, that we will have 2 x original datamemory consumption after this operations.

这使我们的代码更紧凑(尽管也更难解释和调试)并消耗更少的内存,因为链接方法与另一个方法的返回值一起工作,从而导致输入数据的只有一个副本。我们可以清楚地看到,在这些操作之后,我们将有2 倍的原始数据内存消耗。

Or we can use inplaceparameter (though harder to interpret and debug too) our memory consumption will be 2 x original data, but our memory consumption after this operation remains 1 x original data, which if somebody whenever worked with huge datasets exactly knows can be a big benefit.

或者我们可以使用inplace参数(虽然也更难解释和调试)我们的内存消耗将是原始数据的 2 倍,但此操作后我们的内存消耗仍然是原始数据的 1 倍,如果有人在处理大量数据集时确切知道这可能是一个大好处。



Final conclusion:

定论:

Avoid using inplaceparameter unless you don't work with huge data and be aware of its possible issues in case of still using of it.

避免使用inplace参数,除非您不使用大量数据,并且在仍然使用它的情况下注意其可能出现的问题。

回答by Harsha

When trying to make changes to a Pandas dataframe using a function, we use 'inplace=True' if we want to commit the changes to the dataframe. Therefore, the first line in the following code changes the name of the first column in 'df' to 'Grades'. We need to call the database if we want to see the resulting database.

当尝试使用函数对 Pandas 数据帧进行更改时,如果我们想提交对数据帧的更改,我们使用 'inplace=True'。因此,以下代码中的第一行将“df”中第一列的名称更改为“Grades”。如果我们想查看生成的数据库,我们需要调用数据库。

df.rename(columns={0: 'Grades'}, inplace=True)
df

We use 'inplace=False' (this is also the default value) when we don't want to commit the changes but just print the resulting database. So, in effect a copy of the original database with the committed changes is printed without altering the original database.

当我们不想提交更改而只想打印结果数据库时,我们使用 'inplace=False'(这也是默认值)。因此,实际上是在不更改原始数据库的情况下打印具有已提交更改的原始数据库的副本。

Just to be more clear, the following codes do the same thing:

为了更清楚,以下代码做同样的事情:

#Code 1
df.rename(columns={0: 'Grades'}, inplace=True)
#Code 2
df=df.rename(columns={0: 'Grades'}, inplace=False}

回答by Ryan Hunt

If you don't use inplace=True or you use inplace=False you basically get back a copy.

如果你不使用 inplace=True 或者你使用 inplace=False 你基本上会得到一个副本。

So for instance:

所以例如:

testdf.sort_values(inplace=True, by='volume', ascending=False)

will alter the structure with the data sorted in descending order.

将使用按降序排序的数据更改结构。

then:

然后:

testdf2 = testdf.sort_values( by='volume', ascending=True)

will make testdf2 a copy. the values will all be the same but the sort will be reversed and you will have an independent object.

将使 testdf2 成为副本。值将全部相同,但排序将相反,您将拥有一个独立的对象。

then given another column, say LongMA and you do:

然后给出另一列,说 LongMA,你做:

testdf2.LongMA = testdf2.LongMA -1

the LongMA column in testdf will have the original values and testdf2 will have the decrimented values.

testdf 中的 LongMA 列将具有原始值,而 testdf2 将具有减量值。

It is important to keep track of the difference as the chain of calculations grows and the copies of dataframes have their own lifecycle.

随着计算链的增长和数据帧的副本有自己的生命周期,跟踪差异很重要。

回答by Shahir Ansari

inplace=Trueis used depending if you want to make changes to the original df or not.

inplace=True使用取决于您是否要对原始 df 进行更改。

df.drop_duplicates()

will only make a view of dropped values but not make any changes to df

只会查看已删除的值,但不会对 df 进行任何更改

df.drop_duplicates(inplace  = True)

will drop values and make changes to df.

将删除值并对 df 进行更改。

Hope this helps.:)

希望这可以帮助。:)

回答by Louis

inplace=Truemakes the function impure. It changes the original dataframe and returns None. In that case, You breaks the DSL chain. Because most of dataframe functions return a new dataframe, you can use the DSL conveniently. Like

inplace=True使函数不纯。它更改原始数据帧并返回 None。在这种情况下,您破坏了 DSL 链。由于大多数数据帧函数都会返回一个新的数据帧,因此您可以方便地使用 DSL。喜欢

df.sort_values().rename().to_csv()

Function call with inplace=Truereturns None and DSL chain is broken. For example

函数调用inplace=True返回 None 并且 DSL 链被破坏。例如

df.sort_values(inplace=True).rename().to_csv()

will throw NoneType object has no attribute 'rename'

会扔 NoneType object has no attribute 'rename'

Something similar with python's build-in sort and sorted. lst.sort()returns Noneand sorted(lst)returns a new list.

类似于 python 的内置排序和排序。lst.sort()返回Nonesorted(lst)返回一个新列表。

Generally, do not use inplace=Trueunless you have specific reason of doing so. When you have to write reassignment code like df = df.sort_values(), try attaching the function call in the DSL chain, e.g.

一般情况下,inplace=True除非有特殊原因,否则不要使用。当您必须编写像 那样的重新分配代码时df = df.sort_values(),请尝试在 DSL 链中附加函数调用,例如

df = pd.read_csv().sort_values()...

回答by Chetan

As Far my experience in pandas I would like to answer.

就我在熊猫方面的经验而言,我想回答一下。

The 'inplace=True' argument stands for the data frame has to make changes permanent eg.

'inplace=True' 参数代表数据框必须进行永久性更改,例如。

    df.dropna(axis='index', how='all', inplace=True)

changes the same dataframe (as this pandas find NaN entries in index and drops them). If we try

更改相同的数据帧(因为这个熊猫在索引中找到 NaN 条目并删除它们)。如果我们尝试

    df.dropna(axis='index', how='all')

pandas shows the dataframe with changes we make but will not modify the original dataframe 'df'.

pandas 显示了我们所做更改的数据框,但不会修改原始数据框 'df'。