在 Pandas 中,.iloc 方法是否提供副本或视图?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47972633/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:57:45  来源:igfitidea点击:

In Pandas, does .iloc method give a copy or view?

pythonpandasdataframe

提问by Qiyu

I find the result is a little bit random. Sometimes it's a copy sometimes it's a view. For example:

我发现结果有点随机。有时是副本有时是视图。例如:

df = pd.DataFrame([{'name':'Marry', 'age':21},{'name':'John','age':24}],index=['student1','student2'])

df
              age   name
   student1   21  Marry
   student2   24   John

Now, Let me try to modify it a little bit.

现在,让我试着稍微修改一下。

df2= df.loc['student1']
df2 [0] = 23
df
              age   name
   student1   21  Marry
   student2   24   John

As you can see, nothing changed. df2 is a copy. However, if I add another student into the dataframe...

如您所见,什么都没有改变。df2 是一个副本。但是,如果我将另一个学生添加到数据框中...

df.loc['student3'] = ['old','Tom']
df
               age   name
    student1   21  Marry
    student2   24   John
    student3  old    Tom

Try to change the age again..

再换个年龄试试。。

df3=df.loc['student1']
df3[0]=33
df
               age   name
    student1   33  Marry
    student2   24   John
    student3  old    Tom

Now df3 suddenly became a view. What is going on? I guess the value 'old' is the key?

现在df3突然变成了一个视图。到底是怎么回事?我猜“旧”的价值是关键?

采纳答案by juanpa.arrivillaga

In general, you can get a view if the data-frame has a single dtype, which is notthe case with your original data-frame:

一般来说,如果数据框有单个 ,您可以获得一个视图dtype,而您的原始数据框不是这种情况:

In [4]: df
Out[4]:
          age   name
student1   21  Marry
student2   24   John

In [5]: df.dtypes
Out[5]:
age      int64
name    object
dtype: object

However, when you do:

但是,当您这样做时:

In [6]: df.loc['student3'] = ['old','Tom']
   ...:

The first column get's coerced to object, since columns cannot have mixed dtypes:

第一列被强制为object,因为列不能有混合数据类型:

In [7]: df.dtypes
Out[7]:
age     object
name    object
dtype: object

In this case, the underlying .valueswill always return an array with the same underlying buffer, and changes to that array will be reflected in the data-frame:

在这种情况下,底层.values将始终返回一个具有相同底层缓冲区的数组,对该数组的更改将反映在数据帧中:

In [11]: vals = df.values

In [12]: vals
Out[12]:
array([[21, 'Marry'],
       [24, 'John'],
       ['old', 'Tom']], dtype=object)

In [13]: vals[0,0] = 'foo'

In [14]: vals
Out[14]:
array([['foo', 'Marry'],
       [24, 'John'],
       ['old', 'Tom']], dtype=object)

In [15]: df
Out[15]:
          age   name
student1  foo  Marry
student2   24   John
student3  old    Tom

On the other hand, with mixed types like with your original data-frame:

另一方面,对于像原始数据框这样的混合类型:

In [26]: df = pd.DataFrame([{'name':'Marry', 'age':21},{'name':'John','age':24}]
    ...: ,index=['student1','student2'])
    ...:

In [27]: vals = df.values

In [28]: vals
Out[28]:
array([[21, 'Marry'],
       [24, 'John']], dtype=object)

In [29]: vals[0,0] = 'foo'

In [30]: vals
Out[30]:
array([['foo', 'Marry'],
       [24, 'John']], dtype=object)

In [31]: df
Out[31]:
          age   name
student1   21  Marry
student2   24   John

Note, however, that a view will only be returned if it is possible to be a view, i.e. if it is a proper slice, otherwise, a copy will be made regardless of the dtypes:

但是请注意,只有当它可能是一个视图时才会返回一个视图,即如果它是一个适当的切片,否则,无论 dtypes 是什么,都将进行复制:

In [39]: df.loc['student3'] = ['old','Tom']


In [40]: df2
Out[40]:
          name
student3   Tom
student2  John

In [41]: df2.loc[:] = 'foo'

In [42]: df2
Out[42]:
         name
student3  foo
student2  foo

In [43]: df
Out[43]:
          age   name
student1   21  Marry
student2   24   John
student3  old    Tom

回答by ayhan

You are starting with a DataFrame that has two columns with two different dtypes:

您从一个 DataFrame 开始,它有两列具有两种不同的数据类型:

df.dtypes
Out: 
age      int64
name    object
dtype: object

Since different dtypes are stored in different numpy arrays under the hood, you have two different blocks for them:

由于不同的 dtype 存储在不同的 numpy 数组中,因此您有两个不同的块:

df.blocks

Out: 
{'int64':           age
 student1   21
 student2   24, 'object':            name
 student1  Marry
 student2   John}

If you attempt to slice the first row of this DataFrame, it has to get one value from each different block which makes it necessary to create a copy.

如果您尝试对该 DataFrame 的第一行进行切片,则它必须从每个不同的块中获取一个值,这使得创建副本成为必要。

df2.is_copy
Out[40]: <weakref at 0x7fc4487a9228; to 'DataFrame' at 0x7fc4488f9dd8>

In the second attempt, you are changing the dtypes. Since 'old' cannot be stored in an integer array, it casts the Series as an object Series.

在第二次尝试中,您正在更改 dtype。由于 'old' 不能存储在整数数组中,因此它将系列转换为对象系列。

df.loc['student3'] = ['old','Tom']

df.dtypes
Out: 
age     object
name    object
dtype: object

Now all data for this DataFrame is stored in a single block (and in a single numpy array):

现在,此 DataFrame 的所有数据都存储在单个块中(和单个 numpy 数组中):

df.blocks

Out: 
{'object':           age   name
 student1   21  Marry
 student2   24   John
 student3  old    Tom}

At this step, slicing the first row can be done on the numpy array without creating a copy, so it returns a view.

在这一步,可以在 numpy 数组上完成第一行的切片,而无需创建副本,因此它返回一个视图。

df3._is_view
Out: True