Python Pandas:向数据帧追加一行并指定其索引标签
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16824607/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: Appending a row to a dataframe and specify its index label
提问by Amelio Vazquez-Reina
Is there any way to specify the index that I want for a new row, when appending the row to a dataframe?
将行附加到数据帧时,有什么方法可以指定新行所需的索引?
The original documentation provides the following example:
原始文档提供了以下示例:
In [1301]: df = DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])
In [1302]: df
Out[1302]:
A B C D
0 -1.137707 -0.891060 -0.693921 1.613616
1 0.464000 0.227371 -0.496922 0.306389
2 -2.290613 -1.134623 -1.561819 -0.260838
3 0.281957 1.523962 -0.902937 0.068159
4 -0.057873 -0.368204 -1.144073 0.861209
5 0.800193 0.782098 -1.069094 -1.099248
6 0.255269 0.009750 0.661084 0.379319
7 -0.008434 1.952541 -1.056652 0.533946
In [1303]: s = df.xs(3)
In [1304]: df.append(s, ignore_index=True)
Out[1304]:
A B C D
0 -1.137707 -0.891060 -0.693921 1.613616
1 0.464000 0.227371 -0.496922 0.306389
2 -2.290613 -1.134623 -1.561819 -0.260838
3 0.281957 1.523962 -0.902937 0.068159
4 -0.057873 -0.368204 -1.144073 0.861209
5 0.800193 0.782098 -1.069094 -1.099248
6 0.255269 0.009750 0.661084 0.379319
7 -0.008434 1.952541 -1.056652 0.533946
8 0.281957 1.523962 -0.902937 0.068159
where the new row gets the index label automatically. Is there any way to control the new label?
其中新行自动获取索引标签。有没有办法控制新标签?
采纳答案by unutbu
The nameof the Series becomes the indexof the row in the DataFrame:
该name系列的成为index在数据帧的行:
In [99]: df = pd.DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])
In [100]: s = df.xs(3)
In [101]: s.name = 10
In [102]: df.append(s)
Out[102]:
A B C D
0 -2.083321 -0.153749 0.174436 1.081056
1 -1.026692 1.495850 -0.025245 -0.171046
2 0.072272 1.218376 1.433281 0.747815
3 -0.940552 0.853073 -0.134842 -0.277135
4 0.478302 -0.599752 -0.080577 0.468618
5 2.609004 -1.679299 -1.593016 1.172298
6 -0.201605 0.406925 1.983177 0.012030
7 1.158530 -2.240124 0.851323 -0.240378
10 -0.940552 0.853073 -0.134842 -0.277135
回答by Alon
df.loc will do the job :
df.loc 将完成这项工作:
>>> df = pd.DataFrame(np.random.randn(3, 2), columns=['A','B'])
>>> df
A B
0 -0.269036 0.534991
1 0.069915 -1.173594
2 -1.177792 0.018381
>>> df.loc[13] = df.loc[1]
>>> df
A B
0 -0.269036 0.534991
1 0.069915 -1.173594
2 -1.177792 0.018381
13 0.069915 -1.173594
回答by Harshit
I shall refer to the same sample of data as posted in the question:
我将参考问题中发布的相同数据样本:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])
print('The original data frame is: \n{}'.format(df))
Running this code will give you
运行此代码会给你
The original data frame is:
A B C D
0 0.494824 -0.328480 0.818117 0.100290
1 0.239037 0.954912 -0.186825 -0.651935
2 -1.818285 -0.158856 0.359811 -0.345560
3 -0.070814 -0.394711 0.081697 -1.178845
4 -1.638063 1.498027 -0.609325 0.882594
5 -0.510217 0.500475 1.039466 0.187076
6 1.116529 0.912380 0.869323 0.119459
7 -1.046507 0.507299 -0.373432 -1.024795
Now you wish to append a new row to this data frame, which doesn't need to be copy of any other row in the data frame. @Alon suggested an interesting approach to use df.locto append a new row with different index. The issue, however, with this approach is if there is already a row present at that index, it will be overwritten by new values. This is typically the case for datasets when row index is not unique, like store ID in transaction datasets. So a more general solution to your question is to create the row, transform the new row data into a pandas series, name it to the index you want to have and then append it to the data frame. Don't forget to overwrite the original data frame with the one with appended row. The reason is df.appendreturns a view of the dataframe and does not modify its contents. Following is the code:
现在您希望将新行附加到此数据框中,该行不需要复制数据框中的任何其他行。@Alon 提出了一种有趣的方法,用于df.loc附加具有不同索引的新行。然而,这种方法的问题是,如果该索引处已经存在一行,它将被新值覆盖。当行索引不唯一时,数据集通常就是这种情况,例如事务数据集中的商店 ID。因此,对您的问题更通用的解决方案是创建行,将新行数据转换为熊猫系列,将其命名为您想要的索引,然后将其附加到数据框中。不要忘记用附加行的数据框覆盖原始数据框。原因是df.append返回数据框的视图并且不修改其内容。以下是代码:
row = pd.Series({'A':10,'B':20,'C':30,'D':40},name=3)
df = df.append(row)
print('The new data frame is: \n{}'.format(df))
Following would be the new output:
以下将是新的输出:
The new data frame is:
A B C D
0 0.494824 -0.328480 0.818117 0.100290
1 0.239037 0.954912 -0.186825 -0.651935
2 -1.818285 -0.158856 0.359811 -0.345560
3 -0.070814 -0.394711 0.081697 -1.178845
4 -1.638063 1.498027 -0.609325 0.882594
5 -0.510217 0.500475 1.039466 0.187076
6 1.116529 0.912380 0.869323 0.119459
7 -1.046507 0.507299 -0.373432 -1.024795
3 10.000000 20.000000 30.000000 40.000000

