pandas 熊猫中的 set_value 和 = 有什么区别

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43626883/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:28:36  来源:igfitidea点击:

what's the difference between set_value and = in pandas

pythonpandas

提问by user2723494

In writing to a dataframe in pandas, we see we have a couple of ways to do it, as provided by this answerand this answer.

在写入 中的数据帧时pandas,我们看到我们有几种方法可以做到这一点,如this answerthis answer所提供的那样。

We have the method of

我们有方法

  • df[r][c].set_value(r,c,some_value)and the method of
  • df.iloc[r][c] = some_value.
  • df[r][c].set_value(r,c,some_value)和方法
  • df.iloc[r][c] = some_value.

What is the difference? Which is faster? Is either a copy?

有什么不同?哪个更快?要么是副本?

回答by NirIzr

The difference is that set_valueis returningan object, while the assignment operator assigns the value into the existing DataFrameobject.

不同的是,set_value返回的对象,而赋值操作符分配值到现有的DataFrame对象。

after calling set_valueyou will potentially have twoDataFrameobjects (this does not necessarily mean you'll have two copies of the data, as DataFrameobjects can "reference" one another) while the assignment operator will change data in the single DataFrameobject.

调用后,set_value您可能会拥有两个DataFrame对象(这并不一定意味着您将拥有数据的两个副本,因为DataFrame对象可以相互“引用”),而赋值运算符将更改单个DataFrame对象中的数据。

It appears to be faster to use the set_value, as it is probably optimized for that use-case, while the assignment approach will generate intermediate slices of the data:

使用 似乎更快set_value,因为它可能针对该用例进行了优化,而分配方法将生成数据的中间切片:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df=pd.DataFrame(np.random.rand(100,100))

In [4]: %timeit df[10][10]=7
The slowest run took 6.43 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 89.5 μs per loop

In [5]: %timeit df.set_value(10,10,11)
The slowest run took 10.89 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 3.94 μs per loop

the result of set_valuemaybe a copy, but the documentationis not really clear (to me) on this:

结果set_value可能是一份副本,但文档(对我而言)对此并不十分清楚:

Returns:

frame : DataFrame

If label pair is contained, will be reference to calling DataFrame, otherwise a new object

返回:

框架:数据框架

如果包含标签对,将引用调用DataFrame,否则为新对象