pandas 熊猫合并列,但不合并“关键”列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/22208218/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas merge columns, but not the 'key' column
提问by ChrisArmstrong
This may seem like a stupid question, but this has been bugging me for some time.
这似乎是一个愚蠢的问题,但这已经困扰了我一段时间。
df1:
df1:
imp_type    value
1           abc
2           def
3           ghi
df2:
df2:
id          value2
1           123
2           345
3           567
Merginge the 2 df's:
合并 2 个 df:
df1.merge(df2, left_on='imp_type',right_on='id')
yields:
产量:
imp_type    value    id    value2
1           abc      1     123
2           def      2     345
3           ghi      3     567
Then I need to drop the idcolumn since it's essentially a duplicate of the imp_type column. Why does merge pull in the join key between the 2 dataframes by default? I would think there should at least be a param to set to False if you don't want to pull in the join key. Is there something like this already or something I'm doing wrong?
然后我需要删除该id列,因为它本质上是 imp_type 列的副本。为什么默认情况下合并会拉入 2 个数据帧之间的连接键?如果您不想加入连接键,我认为至少应该有一个参数设置为 False 。是否已经有这样的事情或我做错了什么?
采纳答案by unutbu
I agree it would be nice if one of the columns were dropped. Of course, then there is the question of what to name the remaining column.
我同意如果删除其中一列会很好。当然,还有一个问题是如何命名剩余的列。
Anyway, here is a workaround. Simply rename one of the columns so that the joined column(s) have the same name:
无论如何,这是一种解决方法。只需重命名其中一列,以便连接的列具有相同的名称:
In [23]: df1 = pd.DataFrame({'imp_type':[1,2,3], 'value':['abc','def','ghi']})
In [27]: df2 = pd.DataFrame({'id':[1,2,3], 'value2':[123,345,567]})
In [28]: df2.columns = ['imp_type','value2']
In [29]: df1.merge(df2, on='imp_type')
Out[29]: 
   imp_type value  value2
0         1   abc     123
1         2   def     345
2         3   ghi     567
Renaming the columns is a bit of a pain, especially (as DSM points out) compared to .drop('id', 1). However, if you can arrange for the joined columns to have the same name from the very beginning, then df1.merge(df2, on='imp_type')would be easiest.
重命名列有点麻烦,尤其是(正如 DSM 指出的那样)与.drop('id', 1). 但是,如果您可以安排连接的列从一开始就具有相同的名称,那df1.merge(df2, on='imp_type')将是最简单的。

