Python Pandas 仅合并某些列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17978133/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 09:39:54  来源:igfitidea点击:

Python Pandas merge only certain columns

pythonmergepandas

提问by BubbleGuppies

Is it possible to only merge some columns? I have a DataFrame df1 with columns x, y, z, and df2 with columns x, a ,b, c, d, e, f, etc.

是否可以只合并一些列?我有一个带有 x、y、z 列的 DataFrame df1 和带有 x、a、b、c、d、e、f 等列的 df2。

I want to merge the two DataFrames on x, but I only want to merge columns df2.a, df2.b - not the entire DataFrame.

我想合并 x 上的两个 DataFrame,但我只想合并 df2.a、df2.b 列 - 而不是整个 DataFrame。

The result would be a DataFrame with x, y, z, a, b.

结果将是一个包含 x、y、z、a、b 的 DataFrame。

I could merge then delete the unwanted columns, but it seems like there is a better method.

我可以合并然后删除不需要的列,但似乎有更好的方法。

采纳答案by Andy Hayden

You could merge the sub-DataFrame (with just those columns):

您可以合并子 DataFrame(仅包含这些列):

df2[list('xab')]  # df2 but only with columns x, a, and b

df1.merge(df2[list('xab')])

回答by Terrance DeJesus

You can use .locto select the specific columns with all rows and then pull that. An example is below:

您可以使用.loc来选择包含所有行的特定列,然后将其拉出。一个例子如下:

pandas.merge(dataframe1, dataframe2.iloc[:, [0:5]], how='left', on='key')

In this example, you are merging dataframe1 and dataframe2. You have chosen to do an outer left join on 'key'. However, for dataframe2 you have specified .ilocwhich allows you to specific the rows and columns you want in a numerical format. Using :, your selecting all rows, but [0:5]selects the first 5 columns. You could use .locto specify by name, but if your dealing with long column names, then .ilocmay be better.

在此示例中,您正在合并 dataframe1 和 dataframe2。您已选择对“键”进行外部左连接。但是,对于您指定的 dataframe2,.iloc它允许您以数字格式指定所需的行和列。使用:,您选择所有行,但[0:5]选择前 5 列。您可以使用.loc按名称指定,但如果您处理长列名,那么.iloc可能会更好。

回答by Arthur D. Howland

You want to use TWO brackets, so if you are doing a VLOOKUP sort of action:

您想使用两个括号,因此如果您正在执行 VLOOKUP 类型的操作:

df = pd.merge(df,df2[['Key_Column','Target_Column']],on='Key_Column', how='left')

This will give you everything in the original df + add that one corresponding column in df2 that you want to join.

这将为您提供原始 df 中的所有内容,并在 df2 中添加您要加入的相应列。

回答by Marco167

This is to merge selected columns from two tables.

这是从两个表中合并选定的列。

If table_1contains t1_a,t1_b,t1_c..,id,..t1_zcolumns, and table_2contains t2_a, t2_b, t2_c..., id,..t2_zcolumns, and only t1_a, id, t2_a are required in the final table, then

如果table_1包含t1_a,t1_b,t1_c..,id,..t1_z列,并且table_2包含t2_a, t2_b, t2_c..., id,..t2_z列,并且最终表中只需要t1_a,id,t2_a,则

mergedCSV = table_1[['t1_a','id']].merge(table_2[['t2_a','id']], on = 'id',how = 'left')
# save resulting output file    
mergedCSV.to_csv('output.csv',index = False)

回答by tonneofash

If you want to drop column(s) from the target data frame, but the column(s) are required for the join, you can do the following:

如果您想从目标数据框中删除列,但连接需要这些列,您可以执行以下操作:

df1 = df1.merge(df2[['a', 'b', 'key1']], how = 'left',
                left_on = 'key2', right_on = 'key1').drop('key1')

The .drop('key1')part will prevent 'key1' from being kept in the resulting data frame, despite it being required to join in the first place.

.drop('key1')部分将阻止“key1”保留在结果数据框中,尽管它首先需要加入。