在 Pandas 数据框中的特定索引处插入新行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44599589/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:48:30  来源:igfitidea点击:

Inserting new rows in pandas data frame at specific indices

pythonpandasdataframe

提问by Liza

I have a following data frame dfwith two columns "identifier", "values" and "subid":

我有以下数据框df有两列“标识符”、“值”和“subid”:

     identifier   values    subid
0      1          101       1
1      1          102       1
2      1          103       2 #index in list x        
3      1          104       2
4      1          105       2
5      2          106       3   
6      2          107       3
7      2          108       3
8      2          109       4 #index in list x
9      2          110       4
10     3          111       5
11     3          112       5 
12     3          113       6 #index in list x

I have a list of indices, say

我有一个索引列表,比如

x = [2, 8, 12] 

I want insert rows just before the indices mentioned in the list x. Like, for the row which is inserted just before index 2, will have the following values, it will have the same identifieras the row at index 2, i.e. 1; same valuesas the row at index 2, i.e. 103; but the subidin the new row would be ((subid at index 2) - 1), or simply the subid from the previous row i.e 1.

我想在列表 x 中提到的索引之前插入行。例如,对于在索引 2 之前插入的行,将具有以下值,它将具有索引 2 处的行相同的标识符,即 1; 索引 2 处的行相同的值,即 103;但新行中的subid将是 ((subid at index 2) - 1),或者只是前一行的 subid,即 1。

Below is the final resultant df I expect:

以下是我期望的最终结果 df:

   identifier   values    subid
0      1          101       1
1      1          102       1
2      1          103       1 #new row inserted     
3      1          103       2 #index in list x        
4      1          104       2
5      1          105       2
6      2          106       3   
7      2          107       3
8      2          108       3
9      2          109       3 #new row inserted
10     2          109       4 #index in list x
11     2          110       4
12     3          111       5
13     3          112       5 
14     3          113       5 #new row inserted
15     3          113       6 #index in list x

The code I have been trying:

我一直在尝试的代码:

 m = df.index       #storing the indices of the df
 #m

 for i in m:
     if i in x:     #x is the given list of indices
         df.iloc[i-1]["identifier"] = df.iloc[i]["identifier"]
         df.iloc[i-1]["values"] = df.iloc[i]["values"]
         df.iloc[i-1]["subid"] = (df.iloc[i]["subid"]-1)
 df

The above code is simply replacingthe rows at (i-1) indices and not insertingthe additional rows with the above values. Please help.

上面的代码只是替换(i-1) 索引处的行,而不是插入具有上述值的附加行。请帮忙。

Please let me know if anything is unclear.

如果有任何不清楚的地方,请告诉我。

采纳答案by bdiamante

Preserving the index order is the tricky part. I'm not sure this is the most efficient way to do this, but it should work.

保留索引顺序是棘手的部分。我不确定这是最有效的方法,但它应该有效。

x = [2,8,12]
rows = []
cur = {}

for i in df.index:
    if i in x:
        cur['index'] = i
        cur['identifier'] = df.iloc[i].identifier
        cur['values'] = df.iloc[i]['values']
        cur['subid'] = df.iloc[i].subid - 1
        rows.append(cur)
        cur = {}

Then, iterate through the new rows list, and perform an incremental concat, inserting each new row into the correct spot.

然后,遍历新行列表,并执行增量连接,将每个新行插入正确的位置。

offset = 0; #tracks the number of rows already inserted to ensure rows are inserted in the correct position

for d in rows:
    df = pd.concat([df.head(d['index'] + offset), pd.DataFrame([d]), df.tail(len(df) - (d['index']+offset))])
    offset+=1


df.reset_index(inplace=True)
df.drop('index', axis=1, inplace=True)
df

    level_0 identifier  subid   values
0         0          1      1      101
1         1          1      1      102
2         0          1      1      103
3         2          1      2      103
4         3          1      2      104
5         4          1      2      105
6         5          2      3      106
7         6          2      3      107
8         7          2      3      108
9         0          2      3      109
10        8          2      4      109
11        9          2      4      110
12       10          3      5      111
13       11          3      5      112
14        0          3      5      113
15       12          3      6      113

回答by piRSquared

subtract where the prior row is different than the current row

减去前一行与当前行不同的地方

# edit in place
df['values'] -= df.identifier.ne(df.identifier.shift().bfill())
df

    identifier  values
0            1     101
1            1     102
2            1     103
3            1     104
4            1     105
5            2     105
6            2     107
7            2     108
8            2     109
9            2     110
10           3     110
11           3     112
12           3     113

Or

或者

# new dataframe
df.assign(values=df['values'] - df.identifier.ne(df.identifier.shift().bfill()))

    identifier  values
0            1     101
1            1     102
2            1     103
3            1     104
4            1     105
5            2     105
6            2     107
7            2     108
8            2     109
9            2     110
10           3     110
11           3     112
12           3     113