在 Pandas 数据框中的特定索引处插入新行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44599589/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Inserting new rows in pandas data frame at specific indices
提问by Liza
I have a following data frame dfwith two columns "identifier", "values" and "subid":
我有以下数据框df有两列“标识符”、“值”和“subid”:
identifier values subid
0 1 101 1
1 1 102 1
2 1 103 2 #index in list x
3 1 104 2
4 1 105 2
5 2 106 3
6 2 107 3
7 2 108 3
8 2 109 4 #index in list x
9 2 110 4
10 3 111 5
11 3 112 5
12 3 113 6 #index in list x
I have a list of indices, say
我有一个索引列表,比如
x = [2, 8, 12]
I want insert rows just before the indices mentioned in the list x. Like, for the row which is inserted just before index 2, will have the following values, it will have the same identifieras the row at index 2, i.e. 1; same valuesas the row at index 2, i.e. 103; but the subidin the new row would be ((subid at index 2) - 1), or simply the subid from the previous row i.e 1.
我想在列表 x 中提到的索引之前插入行。例如,对于在索引 2 之前插入的行,将具有以下值,它将具有与索引 2 处的行相同的标识符,即 1; 与索引 2 处的行相同的值,即 103;但新行中的subid将是 ((subid at index 2) - 1),或者只是前一行的 subid,即 1。
Below is the final resultant df I expect:
以下是我期望的最终结果 df:
identifier values subid
0 1 101 1
1 1 102 1
2 1 103 1 #new row inserted
3 1 103 2 #index in list x
4 1 104 2
5 1 105 2
6 2 106 3
7 2 107 3
8 2 108 3
9 2 109 3 #new row inserted
10 2 109 4 #index in list x
11 2 110 4
12 3 111 5
13 3 112 5
14 3 113 5 #new row inserted
15 3 113 6 #index in list x
The code I have been trying:
我一直在尝试的代码:
m = df.index #storing the indices of the df
#m
for i in m:
if i in x: #x is the given list of indices
df.iloc[i-1]["identifier"] = df.iloc[i]["identifier"]
df.iloc[i-1]["values"] = df.iloc[i]["values"]
df.iloc[i-1]["subid"] = (df.iloc[i]["subid"]-1)
df
The above code is simply replacingthe rows at (i-1) indices and not insertingthe additional rows with the above values. Please help.
上面的代码只是替换(i-1) 索引处的行,而不是插入具有上述值的附加行。请帮忙。
Please let me know if anything is unclear.
如果有任何不清楚的地方,请告诉我。
采纳答案by bdiamante
Preserving the index order is the tricky part. I'm not sure this is the most efficient way to do this, but it should work.
保留索引顺序是棘手的部分。我不确定这是最有效的方法,但它应该有效。
x = [2,8,12]
rows = []
cur = {}
for i in df.index:
if i in x:
cur['index'] = i
cur['identifier'] = df.iloc[i].identifier
cur['values'] = df.iloc[i]['values']
cur['subid'] = df.iloc[i].subid - 1
rows.append(cur)
cur = {}
Then, iterate through the new rows list, and perform an incremental concat, inserting each new row into the correct spot.
然后,遍历新行列表,并执行增量连接,将每个新行插入正确的位置。
offset = 0; #tracks the number of rows already inserted to ensure rows are inserted in the correct position
for d in rows:
df = pd.concat([df.head(d['index'] + offset), pd.DataFrame([d]), df.tail(len(df) - (d['index']+offset))])
offset+=1
df.reset_index(inplace=True)
df.drop('index', axis=1, inplace=True)
df
level_0 identifier subid values
0 0 1 1 101
1 1 1 1 102
2 0 1 1 103
3 2 1 2 103
4 3 1 2 104
5 4 1 2 105
6 5 2 3 106
7 6 2 3 107
8 7 2 3 108
9 0 2 3 109
10 8 2 4 109
11 9 2 4 110
12 10 3 5 111
13 11 3 5 112
14 0 3 5 113
15 12 3 6 113
回答by piRSquared
subtract where the prior row is different than the current row
减去前一行与当前行不同的地方
# edit in place
df['values'] -= df.identifier.ne(df.identifier.shift().bfill())
df
identifier values
0 1 101
1 1 102
2 1 103
3 1 104
4 1 105
5 2 105
6 2 107
7 2 108
8 2 109
9 2 110
10 3 110
11 3 112
12 3 113
Or
或者
# new dataframe
df.assign(values=df['values'] - df.identifier.ne(df.identifier.shift().bfill()))
identifier values
0 1 101
1 1 102
2 1 103
3 1 104
4 1 105
5 2 105
6 2 107
7 2 108
8 2 109
9 2 110
10 3 110
11 3 112
12 3 113