自加入 Pandas

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41434723/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:42:07  来源:igfitidea点击:

self-join with Pandas

pythonpandasdata-structuresdataframe

提问by Nucular

I would like to perform a self-join on a Pandas dataframe so that some rows get appended to the original rows. Each row has a marker 'i' indicating which row should get appended to it on the right.

我想对 Pandas 数据帧执行自联接,以便将某些行附加到原始行。每行都有一个标记“i”,指示应在右侧附加哪一行。

d = pd.DataFrame(['A','B','C'], columns = ['some_col'])
d['i'] = [2,1,1]

In [17]: d
Out[17]: 
  some_col  i
0        A  2
1        B  1
2        C  1

Desired output:

期望的输出:

  some_col  i some_col_y
0        A  2          C
1        B  1          B
2        C  1          B

That is, row 2 gets appended to row 0, row 1 to row 1, row 1 to row 2 (as indicated by i).

也就是说,第 2 行被附加到第 0 行,第 1 行到第 1 行,第 1 行到第 2 行(如 i 所示)。

My idea of how to go about it was

我对如何去做的想法是

pd.merge(d, d, left_index = True, right_on = 'i', how = 'left')

But it produces something else altogether. How to do it correctly?

但它完全产生了别的东西。如何正确地做到这一点?

采纳答案by piRSquared

joinwith on='i'

joinon='i'

d.join(d.drop('i', 1), on='i', rsuffix='_y')

  some_col  i some_col_y
0        A  2          C
1        B  1          B
2        C  1          B

回答by MSeifert

Instead of using mergeyou can also use indexing and assignment:

除了使用,merge您还可以使用索引和赋值:

>>> d['new_col'] = d['some_col'][d['i']].values
>>> d
  some_col  i new_col
0        A  2       C
1        B  1       B
2        C  1       B

回答by MaxU

Try this:

尝试这个:

In [69]: d.join(d.set_index('i'), rsuffix='_y')
Out[69]:
  some_col  i some_col_y
0        A  2        NaN
1        B  1          B
1        B  1          C
2        C  1          A

or:

或者:

In [64]: pd.merge(d[['some_col']], d, left_index=True, right_on='i', suffixes=['_y','']).sort_index()
Out[64]:
  some_col_y some_col  i
0          C        A  2
1          B        B  1
2          B        C  1