自加入 Pandas

Question

提问by Nucular

I would like to perform a self-join on a Pandas dataframe so that some rows get appended to the original rows. Each row has a marker 'i' indicating which row should get appended to it on the right.

我想对 Pandas 数据帧执行自联接，以便将某些行附加到原始行。每行都有一个标记“i”，指示应在右侧附加哪一行。

d = pd.DataFrame(['A','B','C'], columns = ['some_col'])
d['i'] = [2,1,1]

In [17]: d
Out[17]: 
  some_col  i
0        A  2
1        B  1
2        C  1

Desired output:

期望的输出：

  some_col  i some_col_y
0        A  2          C
1        B  1          B
2        C  1          B

That is, row 2 gets appended to row 0, row 1 to row 1, row 1 to row 2 (as indicated by i).

也就是说，第 2 行被附加到第 0 行，第 1 行到第 1 行，第 1 行到第 2 行（如 i 所示）。

My idea of how to go about it was

我对如何去做的想法是

pd.merge(d, d, left_index = True, right_on = 'i', how = 'left')

But it produces something else altogether. How to do it correctly?

但它完全产生了别的东西。如何正确地做到这一点？

Answer 1

采纳答案by piRSquared

joinwith on='i'

join和 on='i'

d.join(d.drop('i', 1), on='i', rsuffix='_y')

  some_col  i some_col_y
0        A  2          C
1        B  1          B
2        C  1          B

Answer 2

回答by MSeifert

Instead of using mergeyou can also use indexing and assignment:

除了使用，merge您还可以使用索引和赋值：

>>> d['new_col'] = d['some_col'][d['i']].values
>>> d
  some_col  i new_col
0        A  2       C
1        B  1       B
2        C  1       B

Answer 3

回答by MaxU

Try this:

尝试这个：

In [69]: d.join(d.set_index('i'), rsuffix='_y')
Out[69]:
  some_col  i some_col_y
0        A  2        NaN
1        B  1          B
1        B  1          C
2        C  1          A

or:

或者：

In [64]: pd.merge(d[['some_col']], d, left_index=True, right_on='i', suffixes=['_y','']).sort_index()
Out[64]:
  some_col_y some_col  i
0          C        A  2
1          B        B  1
2          B        C  1

自加入 Pandas

提问by Nucular

采纳答案by piRSquared

回答by MSeifert

回答by MaxU

相关推荐

最近更新

标签

自加入 Pandas

提问by Nucular

采纳答案by piRSquared

回答by MSeifert

回答by MaxU

相关推荐

pandas 导出到 CSV 时，如何在列中保留前导零？

pandas Python将Cassandra数据读入pandas

pandas 计算字符串中的字符数，从中创建一个数据框列？

将 Python pandas DataFrame 中的数字格式化为以千或百万为单位的货币

相关推荐

最近更新

标签