在 Pandas 中，.iloc 方法是否提供副本或视图？

Question

提问by Qiyu

I find the result is a little bit random. Sometimes it's a copy sometimes it's a view. For example:

我发现结果有点随机。有时是副本有时是视图。例如：

df = pd.DataFrame([{'name':'Marry', 'age':21},{'name':'John','age':24}],index=['student1','student2'])

df
              age   name
   student1   21  Marry
   student2   24   John

Now, Let me try to modify it a little bit.

现在，让我试着稍微修改一下。

df2= df.loc['student1']
df2 [0] = 23
df
              age   name
   student1   21  Marry
   student2   24   John

As you can see, nothing changed. df2 is a copy. However, if I add another student into the dataframe...

如您所见，什么都没有改变。df2 是一个副本。但是，如果我将另一个学生添加到数据框中...

df.loc['student3'] = ['old','Tom']
df
               age   name
    student1   21  Marry
    student2   24   John
    student3  old    Tom

Try to change the age again..

再换个年龄试试。。

df3=df.loc['student1']
df3[0]=33
df
               age   name
    student1   33  Marry
    student2   24   John
    student3  old    Tom

Now df3 suddenly became a view. What is going on? I guess the value 'old' is the key?

现在df3突然变成了一个视图。到底是怎么回事？我猜“旧”的价值是关键？

Answer 1

采纳答案by juanpa.arrivillaga

In general, you can get a view if the data-frame has a single dtype, which is notthe case with your original data-frame:

一般来说，如果数据框有单个，您可以获得一个视图dtype，而您的原始数据框不是这种情况：

In [4]: df
Out[4]:
          age   name
student1   21  Marry
student2   24   John

In [5]: df.dtypes
Out[5]:
age      int64
name    object
dtype: object

However, when you do:

但是，当您这样做时：

In [6]: df.loc['student3'] = ['old','Tom']
   ...:

The first column get's coerced to object, since columns cannot have mixed dtypes:

第一列被强制为object，因为列不能有混合数据类型：

In [7]: df.dtypes
Out[7]:
age     object
name    object
dtype: object

In this case, the underlying .valueswill always return an array with the same underlying buffer, and changes to that array will be reflected in the data-frame:

在这种情况下，底层.values将始终返回一个具有相同底层缓冲区的数组，对该数组的更改将反映在数据帧中：

In [11]: vals = df.values

In [12]: vals
Out[12]:
array([[21, 'Marry'],
       [24, 'John'],
       ['old', 'Tom']], dtype=object)

In [13]: vals[0,0] = 'foo'

In [14]: vals
Out[14]:
array([['foo', 'Marry'],
       [24, 'John'],
       ['old', 'Tom']], dtype=object)

In [15]: df
Out[15]:
          age   name
student1  foo  Marry
student2   24   John
student3  old    Tom

On the other hand, with mixed types like with your original data-frame:

另一方面，对于像原始数据框这样的混合类型：

In [26]: df = pd.DataFrame([{'name':'Marry', 'age':21},{'name':'John','age':24}]
    ...: ,index=['student1','student2'])
    ...:

In [27]: vals = df.values

In [28]: vals
Out[28]:
array([[21, 'Marry'],
       [24, 'John']], dtype=object)

In [29]: vals[0,0] = 'foo'

In [30]: vals
Out[30]:
array([['foo', 'Marry'],
       [24, 'John']], dtype=object)

In [31]: df
Out[31]:
          age   name
student1   21  Marry
student2   24   John

Note, however, that a view will only be returned if it is possible to be a view, i.e. if it is a proper slice, otherwise, a copy will be made regardless of the dtypes:

但是请注意，只有当它可能是一个视图时才会返回一个视图，即如果它是一个适当的切片，否则，无论 dtypes 是什么，都将进行复制：

In [39]: df.loc['student3'] = ['old','Tom']


In [40]: df2
Out[40]:
          name
student3   Tom
student2  John

In [41]: df2.loc[:] = 'foo'

In [42]: df2
Out[42]:
         name
student3  foo
student2  foo

In [43]: df
Out[43]:
          age   name
student1   21  Marry
student2   24   John
student3  old    Tom

Answer 2

回答by ayhan

You are starting with a DataFrame that has two columns with two different dtypes:

您从一个 DataFrame 开始，它有两列具有两种不同的数据类型：

df.dtypes
Out: 
age      int64
name    object
dtype: object

Since different dtypes are stored in different numpy arrays under the hood, you have two different blocks for them:

由于不同的 dtype 存储在不同的 numpy 数组中，因此您有两个不同的块：

df.blocks

Out: 
{'int64':           age
 student1   21
 student2   24, 'object':            name
 student1  Marry
 student2   John}

If you attempt to slice the first row of this DataFrame, it has to get one value from each different block which makes it necessary to create a copy.

如果您尝试对该 DataFrame 的第一行进行切片，则它必须从每个不同的块中获取一个值，这使得创建副本成为必要。

df2.is_copy
Out[40]: <weakref at 0x7fc4487a9228; to 'DataFrame' at 0x7fc4488f9dd8>

In the second attempt, you are changing the dtypes. Since 'old' cannot be stored in an integer array, it casts the Series as an object Series.

在第二次尝试中，您正在更改 dtype。由于 'old' 不能存储在整数数组中，因此它将系列转换为对象系列。

df.loc['student3'] = ['old','Tom']

df.dtypes
Out: 
age     object
name    object
dtype: object

Now all data for this DataFrame is stored in a single block (and in a single numpy array):

现在，此 DataFrame 的所有数据都存储在单个块中（和单个 numpy 数组中）：

df.blocks

Out: 
{'object':           age   name
 student1   21  Marry
 student2   24   John
 student3  old    Tom}

At this step, slicing the first row can be done on the numpy array without creating a copy, so it returns a view.

在这一步，可以在 numpy 数组上完成第一行的切片，而无需创建副本，因此它返回一个视图。

df3._is_view
Out: True

在 Pandas 中，.iloc 方法是否提供副本或视图？

提问by Qiyu

采纳答案by juanpa.arrivillaga

回答by ayhan

相关推荐

最近更新

标签

在 Pandas 中，.iloc 方法是否提供副本或视图？

提问by Qiyu

采纳答案by juanpa.arrivillaga

回答by ayhan

相关推荐

pandas 按年和月分组 Panda Pivot Table

pandas 如何在熊猫多索引数据框中仅选择索引列？

pandas 使用 Graphviz 绘制决策树时出现“特征名称的长度与特征数量不匹配”错误

如何将合并的 Excel 单元格与 NaN 读入 Pandas DataFrame

相关推荐

最近更新

标签