Python 理解就地=真
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43893457/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Understanding inplace=True
提问by Aran Freel
In the pandas
library many times there is an option to change the object inplace such as with the following statement...
在pandas
库中,很多时候都有一个选项可以就地更改对象,例如使用以下语句...
df.dropna(axis='index', how='all', inplace=True)
I am curious what is being returned as well as how the object is handled when inplace=True
is passed vs. when inplace=False
.
我很好奇返回的内容以及对象在inplace=True
传递时与传递时的处理方式inplace=False
。
Are all operations modifying self
when inplace=True
? And when inplace=False
is a new object created immediately such as new_df = self
and then new_df
is returned?
所有操作都self
在何时修改inplace=True
?什么时候inplace=False
立即创建一个新对象,例如new_df = self
然后new_df
返回?
回答by ECH
When inplace=True
is passed, the data is renamed in place (it returns nothing), so you'd use:
当inplace=True
传递时,数据被重命名(它不返回任何内容),因此您可以使用:
df.an_operation(inplace=True)
When inplace=False
is passed (this is the default value, so isn't necessary), performs the operation and returns a copy of the object, so you'd use:
当inplace=False
传递(这是默认值,所以没有必要),执行操作,并返回该对象的副本,所以你会使用:
df = df.an_operation(inplace=False)
回答by Nabin
The way I use it is
我使用它的方式是
# Have to assign back to dataframe (because it is a new copy)
df = df.some_operation(inplace=False)
Or
或者
# No need to assign back to dataframe (because it is on the same copy)
df.some_operation(inplace=True)
CONCLUSION:
结论:
if inplace is False
Assign to a new variable;
else
No need to assign
回答by cs95
Don't use inplace=True
!
不要用inplace=True
!
TLDR;
TLDR;
inplace
, contrary to what the name implies, often does not prevent copies from being created, and (almost) never offers any performance benefitsinplace
does not work with method chaininginplace
is a common pitfall for beginners, so removing this option will simplify the API
inplace
,与名称所暗示的相反,通常不会阻止创建副本,并且(几乎)从不提供任何性能优势inplace
不适用于方法链inplace
对于初学者来说是一个常见的陷阱,因此删除此选项将简化 API
I don't advise setting this parameter as it serves little purpose. See this GitHub issuewhich proposes the inplace
argument be deprecated api-wide.
我不建议设置这个参数,因为它没什么用。请参阅此 GitHub 问题,该问题建议在inplace
api-wide 范围内弃用该参数。
It is a common misconception that using inplace=True
will lead to more efficient or optimized code. In reality, there are absolutely no performance benefitsto using inplace=True
. Both the in-place and out-of-place versions create a copy of the data anyway, with the in-place version automatically assigning the copy back.
一个常见的误解是使用inplace=True
将导致更高效或优化的代码。在现实中,也有完全没有性能优势使用inplace=True
。就地和非就地版本无论如何都会创建数据的副本,就地版本会自动将副本分配回来。
inplace=True
also hinders method chaining. Contrast the working of
inplace=True
也会阻碍方法链接。对比工作
result = df.some_function1().reset_index().some_function2()
As opposed to
与
temp = df.some_function1()
temp.reset_index(inplace=True)
result = temp.some_function2()
One final caveat to keep in mind is that calling inplace=True
is a pitfall for beginners. For example, it can trigger the SettingWithCopyWarning
:
要记住的最后一个警告是,跟注inplace=True
对于初学者来说是一个陷阱。例如,它可以触发SettingWithCopyWarning
:
df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})
df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
Which can cause unexpected behavior. Use with caution!
这可能会导致意外行为。谨慎使用!
Another pitfall is using inplace=True
when calling a function on a DataFrame column may or may not work. This is especially true when chained indexing is involved.
inplace=True
在 DataFrame 列上调用函数时使用的另一个陷阱可能有效也可能无效。当涉及链式索引时尤其如此。
回答by hyukkyulee
Save it to the same variable
将其保存到相同的变量
data["column01"].where(data["column01"]< 5, inplace=True)
data["column01"].where(data["column01"]< 5, inplace=True)
Save it to a separate variable
将其保存到单独的变量
data["column02"] = data["column01"].where(data["column1"]< 5)
data["column02"] = data["column01"].where(data["column1"]< 5)
But, you can always overwrite the variable
但是,您始终可以覆盖变量
data["column01"] = data["column01"].where(data["column1"]< 5)
data["column01"] = data["column01"].where(data["column1"]< 5)
FYI: In default inplace = False
仅供参考:默认情况下 inplace = False
回答by Geeocode
The inplace
parameter:
该inplace
参数:
df.dropna(axis='index', how='all', inplace=True)
in Pandas
and in general means:
in Pandas
and in 一般是指:
1.Pandas creates a copy of the original data
1.Pandas 创建原始数据的副本
2.... does some computation on it
2.... 对其进行一些计算
3.... assigns the results to the original data.
3.... 将结果分配给原始数据。
4.... deletes the copy.
4....删除副本。
As you can read in the rest of my answer's further below, we still canhave good reason to use this parameter i.e. the inplace operations
, but we should avoid it if we can, as it generate more issues, as:
正如你在我的答案其余阅读下面的进一步,我们还可以有充分的理由来使用此参数即inplace operations
,但如果能,我们应该避免,因为它产生更多的问题,如:
1.Your code will be harder to debug (Actually SettingwithCopyWarningstands for warning you to this possible problem)
1.你的代码将更难调试(实际上SettingwithCopyWarning代表警告你这个可能的问题)
2.Conflict with method chaining
2.与方法链冲突
So there is even case when we should use it yet?
那么甚至还有什么时候我们应该使用它呢?
Definitely yes.If we use pandas or any tool for handeling huge dataset, we can easily face the situation, where some big data can consume our entire memory. To avoid this unwanted effect we can use some technics like method chaining:
肯定是的。如果我们使用pandas或任何工具来处理庞大的数据集,我们很容易面临这样的情况,一些大数据会消耗我们的整个内存。为了避免这种不必要的影响,我们可以使用一些技术,如方法链:
(
wine.rename(columns={"color_intensity": "ci"})
.assign(color_filter=lambda x: np.where((x.hue > 1) & (x.ci > 7), 1, 0))
.query("alcohol > 14 and color_filter == 1")
.sort_values("alcohol", ascending=False)
.reset_index(drop=True)
.loc[:, ["alcohol", "ci", "hue"]]
)
which make our code more compact (though harder to interpret and debug too) and consumes less memory as the chained methods works with the other method's returned values, thus resulting in only one copyof the input data. We can see clearly, that we will have 2 x original datamemory consumption after this operations.
这使我们的代码更紧凑(尽管也更难解释和调试)并消耗更少的内存,因为链接方法与另一个方法的返回值一起工作,从而导致输入数据的只有一个副本。我们可以清楚地看到,在这些操作之后,我们将有2 倍的原始数据内存消耗。
Or we can use inplace
parameter (though harder to interpret and debug too) our memory consumption will be 2 x original data, but our memory consumption after this operation remains 1 x original data, which if somebody whenever worked with huge datasets exactly knows can be a big benefit.
或者我们可以使用inplace
参数(虽然也更难解释和调试)我们的内存消耗将是原始数据的 2 倍,但此操作后我们的内存消耗仍然是原始数据的 1 倍,如果有人在处理大量数据集时确切知道这可能是一个大好处。
Final conclusion:
定论:
Avoid using inplace
parameter unless you don't work with huge data and be aware of its possible issues in case of still using of it.
避免使用inplace
参数,除非您不使用大量数据,并且在仍然使用它的情况下注意其可能出现的问题。
回答by Harsha
When trying to make changes to a Pandas dataframe using a function, we use 'inplace=True' if we want to commit the changes to the dataframe. Therefore, the first line in the following code changes the name of the first column in 'df' to 'Grades'. We need to call the database if we want to see the resulting database.
当尝试使用函数对 Pandas 数据帧进行更改时,如果我们想提交对数据帧的更改,我们使用 'inplace=True'。因此,以下代码中的第一行将“df”中第一列的名称更改为“Grades”。如果我们想查看生成的数据库,我们需要调用数据库。
df.rename(columns={0: 'Grades'}, inplace=True)
df
We use 'inplace=False' (this is also the default value) when we don't want to commit the changes but just print the resulting database. So, in effect a copy of the original database with the committed changes is printed without altering the original database.
当我们不想提交更改而只想打印结果数据库时,我们使用 'inplace=False'(这也是默认值)。因此,实际上是在不更改原始数据库的情况下打印具有已提交更改的原始数据库的副本。
Just to be more clear, the following codes do the same thing:
为了更清楚,以下代码做同样的事情:
#Code 1
df.rename(columns={0: 'Grades'}, inplace=True)
#Code 2
df=df.rename(columns={0: 'Grades'}, inplace=False}
回答by Ryan Hunt
If you don't use inplace=True or you use inplace=False you basically get back a copy.
如果你不使用 inplace=True 或者你使用 inplace=False 你基本上会得到一个副本。
So for instance:
所以例如:
testdf.sort_values(inplace=True, by='volume', ascending=False)
will alter the structure with the data sorted in descending order.
将使用按降序排序的数据更改结构。
then:
然后:
testdf2 = testdf.sort_values( by='volume', ascending=True)
will make testdf2 a copy. the values will all be the same but the sort will be reversed and you will have an independent object.
将使 testdf2 成为副本。值将全部相同,但排序将相反,您将拥有一个独立的对象。
then given another column, say LongMA and you do:
然后给出另一列,说 LongMA,你做:
testdf2.LongMA = testdf2.LongMA -1
the LongMA column in testdf will have the original values and testdf2 will have the decrimented values.
testdf 中的 LongMA 列将具有原始值,而 testdf2 将具有减量值。
It is important to keep track of the difference as the chain of calculations grows and the copies of dataframes have their own lifecycle.
随着计算链的增长和数据帧的副本有自己的生命周期,跟踪差异很重要。
回答by Shahir Ansari
inplace=True
is used depending if you want to make changes to the original df or not.
inplace=True
使用取决于您是否要对原始 df 进行更改。
df.drop_duplicates()
will only make a view of dropped values but not make any changes to df
只会查看已删除的值,但不会对 df 进行任何更改
df.drop_duplicates(inplace = True)
will drop values and make changes to df.
将删除值并对 df 进行更改。
Hope this helps.:)
希望这可以帮助。:)
回答by Louis
inplace=True
makes the function impure. It changes the original dataframe and returns None. In that case, You breaks the DSL chain.
Because most of dataframe functions return a new dataframe, you can use the DSL conveniently. Like
inplace=True
使函数不纯。它更改原始数据帧并返回 None。在这种情况下,您破坏了 DSL 链。由于大多数数据帧函数都会返回一个新的数据帧,因此您可以方便地使用 DSL。喜欢
df.sort_values().rename().to_csv()
Function call with inplace=True
returns None and DSL chain is broken. For example
函数调用inplace=True
返回 None 并且 DSL 链被破坏。例如
df.sort_values(inplace=True).rename().to_csv()
will throw NoneType object has no attribute 'rename'
会扔 NoneType object has no attribute 'rename'
Something similar with python's build-in sort and sorted. lst.sort()
returns None
and sorted(lst)
returns a new list.
类似于 python 的内置排序和排序。lst.sort()
返回None
并sorted(lst)
返回一个新列表。
Generally, do not use inplace=True
unless you have specific reason of doing so. When you have to write reassignment code like df = df.sort_values()
, try attaching the function call in the DSL chain, e.g.
一般情况下,inplace=True
除非有特殊原因,否则不要使用。当您必须编写像 那样的重新分配代码时df = df.sort_values()
,请尝试在 DSL 链中附加函数调用,例如
df = pd.read_csv().sort_values()...
回答by Chetan
As Far my experience in pandas I would like to answer.
就我在熊猫方面的经验而言,我想回答一下。
The 'inplace=True' argument stands for the data frame has to make changes permanent eg.
'inplace=True' 参数代表数据框必须进行永久性更改,例如。
df.dropna(axis='index', how='all', inplace=True)
changes the same dataframe (as this pandas find NaN entries in index and drops them). If we try
更改相同的数据帧(因为这个熊猫在索引中找到 NaN 条目并删除它们)。如果我们尝试
df.dropna(axis='index', how='all')
pandas shows the dataframe with changes we make but will not modify the original dataframe 'df'.
pandas 显示了我们所做更改的数据框,但不会修改原始数据框 'df'。