在 Pandas 数据框中的特定索引处插入新行

Question

提问by Liza

I have a following data frame dfwith two columns "identifier", "values" and "subid":

我有以下数据框df有两列“标识符”、“值”和“subid”：

     identifier   values    subid
0      1          101       1
1      1          102       1
2      1          103       2 #index in list x        
3      1          104       2
4      1          105       2
5      2          106       3   
6      2          107       3
7      2          108       3
8      2          109       4 #index in list x
9      2          110       4
10     3          111       5
11     3          112       5 
12     3          113       6 #index in list x

I have a list of indices, say

我有一个索引列表，比如

x = [2, 8, 12]

I want insert rows just before the indices mentioned in the list x. Like, for the row which is inserted just before index 2, will have the following values, it will have the same identifieras the row at index 2, i.e. 1; same valuesas the row at index 2, i.e. 103; but the subidin the new row would be ((subid at index 2) - 1), or simply the subid from the previous row i.e 1.

我想在列表 x 中提到的索引之前插入行。例如，对于在索引 2 之前插入的行，将具有以下值，它将具有与索引 2 处的行相同的标识符，即 1；与索引 2 处的行相同的值，即 103；但新行中的subid将是 ((subid at index 2) - 1)，或者只是前一行的 subid，即 1。

Below is the final resultant df I expect:

以下是我期望的最终结果 df：

   identifier   values    subid
0      1          101       1
1      1          102       1
2      1          103       1 #new row inserted     
3      1          103       2 #index in list x        
4      1          104       2
5      1          105       2
6      2          106       3   
7      2          107       3
8      2          108       3
9      2          109       3 #new row inserted
10     2          109       4 #index in list x
11     2          110       4
12     3          111       5
13     3          112       5 
14     3          113       5 #new row inserted
15     3          113       6 #index in list x

The code I have been trying:

我一直在尝试的代码：

 m = df.index       #storing the indices of the df
 #m

 for i in m:
     if i in x:     #x is the given list of indices
         df.iloc[i-1]["identifier"] = df.iloc[i]["identifier"]
         df.iloc[i-1]["values"] = df.iloc[i]["values"]
         df.iloc[i-1]["subid"] = (df.iloc[i]["subid"]-1)
 df

The above code is simply replacingthe rows at (i-1) indices and not insertingthe additional rows with the above values. Please help.

上面的代码只是替换(i-1) 索引处的行，而不是插入具有上述值的附加行。请帮忙。

Please let me know if anything is unclear.

如果有任何不清楚的地方，请告诉我。

Answer 1

采纳答案by bdiamante

Preserving the index order is the tricky part. I'm not sure this is the most efficient way to do this, but it should work.

保留索引顺序是棘手的部分。我不确定这是最有效的方法，但它应该有效。

x = [2,8,12]
rows = []
cur = {}

for i in df.index:
    if i in x:
        cur['index'] = i
        cur['identifier'] = df.iloc[i].identifier
        cur['values'] = df.iloc[i]['values']
        cur['subid'] = df.iloc[i].subid - 1
        rows.append(cur)
        cur = {}

Then, iterate through the new rows list, and perform an incremental concat, inserting each new row into the correct spot.

然后，遍历新行列表，并执行增量连接，将每个新行插入正确的位置。

offset = 0; #tracks the number of rows already inserted to ensure rows are inserted in the correct position

for d in rows:
    df = pd.concat([df.head(d['index'] + offset), pd.DataFrame([d]), df.tail(len(df) - (d['index']+offset))])
    offset+=1


df.reset_index(inplace=True)
df.drop('index', axis=1, inplace=True)
df

    level_0 identifier  subid   values
0         0          1      1      101
1         1          1      1      102
2         0          1      1      103
3         2          1      2      103
4         3          1      2      104
5         4          1      2      105
6         5          2      3      106
7         6          2      3      107
8         7          2      3      108
9         0          2      3      109
10        8          2      4      109
11        9          2      4      110
12       10          3      5      111
13       11          3      5      112
14        0          3      5      113
15       12          3      6      113

Answer 2

回答by piRSquared

subtract where the prior row is different than the current row

减去前一行与当前行不同的地方

# edit in place
df['values'] -= df.identifier.ne(df.identifier.shift().bfill())
df

    identifier  values
0            1     101
1            1     102
2            1     103
3            1     104
4            1     105
5            2     105
6            2     107
7            2     108
8            2     109
9            2     110
10           3     110
11           3     112
12           3     113

Or

或者

# new dataframe
df.assign(values=df['values'] - df.identifier.ne(df.identifier.shift().bfill()))

    identifier  values
0            1     101
1            1     102
2            1     103
3            1     104
4            1     105
5            2     105
6            2     107
7            2     108
8            2     109
9            2     110
10           3     110
11           3     112
12           3     113

在 Pandas 数据框中的特定索引处插入新行

提问by Liza

采纳答案by bdiamante

回答by piRSquared

相关推荐

最近更新

标签

在 Pandas 数据框中的特定索引处插入新行

提问by Liza

采纳答案by bdiamante

回答by piRSquared

相关推荐

pandas 计算数据帧中纬度和经度之间的距离

Pandas 数据帧范围索引

pandas GridSearchCV.best_score_ 评分设置为“准确度”和 CV 时的含义

pandas 使用seaborn在python中绘制3列的热图

相关推荐

最近更新

标签