Python Pandas：向数据帧追加一行并指定其索引标签

Question

提问by Amelio Vazquez-Reina

Is there any way to specify the index that I want for a new row, when appending the row to a dataframe?

将行附加到数据帧时，有什么方法可以指定新行所需的索引？

The original documentation provides the following example:

In [1301]: df = DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])

In [1302]: df
Out[1302]: 
          A         B         C         D
0 -1.137707 -0.891060 -0.693921  1.613616
1  0.464000  0.227371 -0.496922  0.306389
2 -2.290613 -1.134623 -1.561819 -0.260838
3  0.281957  1.523962 -0.902937  0.068159
4 -0.057873 -0.368204 -1.144073  0.861209
5  0.800193  0.782098 -1.069094 -1.099248
6  0.255269  0.009750  0.661084  0.379319
7 -0.008434  1.952541 -1.056652  0.533946

In [1303]: s = df.xs(3)

In [1304]: df.append(s, ignore_index=True)
Out[1304]: 
          A         B         C         D
0 -1.137707 -0.891060 -0.693921  1.613616
1  0.464000  0.227371 -0.496922  0.306389
2 -2.290613 -1.134623 -1.561819 -0.260838
3  0.281957  1.523962 -0.902937  0.068159
4 -0.057873 -0.368204 -1.144073  0.861209
5  0.800193  0.782098 -1.069094 -1.099248
6  0.255269  0.009750  0.661084  0.379319
7 -0.008434  1.952541 -1.056652  0.533946
8  0.281957  1.523962 -0.902937  0.068159

where the new row gets the index label automatically. Is there any way to control the new label?

其中新行自动获取索引标签。有没有办法控制新标签？

Answer 1

采纳答案by unutbu

The nameof the Series becomes the indexof the row in the DataFrame:

该name系列的成为index在数据帧的行：

In [99]: df = pd.DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])

In [100]: s = df.xs(3)

In [101]: s.name = 10

In [102]: df.append(s)
Out[102]: 
           A         B         C         D
0  -2.083321 -0.153749  0.174436  1.081056
1  -1.026692  1.495850 -0.025245 -0.171046
2   0.072272  1.218376  1.433281  0.747815
3  -0.940552  0.853073 -0.134842 -0.277135
4   0.478302 -0.599752 -0.080577  0.468618
5   2.609004 -1.679299 -1.593016  1.172298
6  -0.201605  0.406925  1.983177  0.012030
7   1.158530 -2.240124  0.851323 -0.240378
10 -0.940552  0.853073 -0.134842 -0.277135

Answer 2

回答by Alon

df.loc will do the job :

df.loc 将完成这项工作：

>>> df = pd.DataFrame(np.random.randn(3, 2), columns=['A','B'])
>>> df
          A         B
0 -0.269036  0.534991
1  0.069915 -1.173594
2 -1.177792  0.018381
>>> df.loc[13] = df.loc[1]
>>> df
           A         B
0  -0.269036  0.534991
1   0.069915 -1.173594
2  -1.177792  0.018381
13  0.069915 -1.173594

Answer 3

回答by Harshit

I shall refer to the same sample of data as posted in the question:

我将参考问题中发布的相同数据样本：

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])
print('The original data frame is: \n{}'.format(df))

Running this code will give you

运行此代码会给你

The original data frame is:

          A         B         C         D
0  0.494824 -0.328480  0.818117  0.100290
1  0.239037  0.954912 -0.186825 -0.651935
2 -1.818285 -0.158856  0.359811 -0.345560
3 -0.070814 -0.394711  0.081697 -1.178845
4 -1.638063  1.498027 -0.609325  0.882594
5 -0.510217  0.500475  1.039466  0.187076
6  1.116529  0.912380  0.869323  0.119459
7 -1.046507  0.507299 -0.373432 -1.024795

Now you wish to append a new row to this data frame, which doesn't need to be copy of any other row in the data frame. @Alon suggested an interesting approach to use df.locto append a new row with different index. The issue, however, with this approach is if there is already a row present at that index, it will be overwritten by new values. This is typically the case for datasets when row index is not unique, like store ID in transaction datasets. So a more general solution to your question is to create the row, transform the new row data into a pandas series, name it to the index you want to have and then append it to the data frame. Don't forget to overwrite the original data frame with the one with appended row. The reason is df.appendreturns a view of the dataframe and does not modify its contents. Following is the code:

现在您希望将新行附加到此数据框中，该行不需要复制数据框中的任何其他行。@Alon 提出了一种有趣的方法，用于df.loc附加具有不同索引的新行。然而，这种方法的问题是，如果该索引处已经存在一行，它将被新值覆盖。当行索引不唯一时，数据集通常就是这种情况，例如事务数据集中的商店 ID。因此，对您的问题更通用的解决方案是创建行，将新行数据转换为熊猫系列，将其命名为您想要的索引，然后将其附加到数据框中。不要忘记用附加行的数据框覆盖原始数据框。原因是df.append返回数据框的视图并且不修改其内容。以下是代码：

row = pd.Series({'A':10,'B':20,'C':30,'D':40},name=3)
df = df.append(row)
print('The new data frame is: \n{}'.format(df))

Following would be the new output:

以下将是新的输出：

The new data frame is:

           A          B          C          D
0   0.494824  -0.328480   0.818117   0.100290
1   0.239037   0.954912  -0.186825  -0.651935
2  -1.818285  -0.158856   0.359811  -0.345560
3  -0.070814  -0.394711   0.081697  -1.178845
4  -1.638063   1.498027  -0.609325   0.882594
5  -0.510217   0.500475   1.039466   0.187076
6   1.116529   0.912380   0.869323   0.119459
7  -1.046507   0.507299  -0.373432  -1.024795
3  10.000000  20.000000  30.000000  40.000000

Python Pandas：向数据帧追加一行并指定其索引标签

提问by Amelio Vazquez-Reina

采纳答案by unutbu

回答by Alon

回答by Harshit

相关推荐

最近更新

标签

Python Pandas：向数据帧追加一行并指定其索引标签

提问by Amelio Vazquez-Reina

采纳答案by unutbu

回答by Alon

回答by Harshit

相关推荐

如何在opencv2 python中调整窗口大小

用 Python 在句子列表中形成单词的双元组

Python在单击时获取鼠标x，y位置

Python [:, :] 在 NumPy 数组上是什么意思

相关推荐

最近更新

标签