Python Pandas 仅合并某些列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17978133/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas merge only certain columns
提问by BubbleGuppies
Is it possible to only merge some columns? I have a DataFrame df1 with columns x, y, z, and df2 with columns x, a ,b, c, d, e, f, etc.
是否可以只合并一些列?我有一个带有 x、y、z 列的 DataFrame df1 和带有 x、a、b、c、d、e、f 等列的 df2。
I want to merge the two DataFrames on x, but I only want to merge columns df2.a, df2.b - not the entire DataFrame.
我想合并 x 上的两个 DataFrame,但我只想合并 df2.a、df2.b 列 - 而不是整个 DataFrame。
The result would be a DataFrame with x, y, z, a, b.
结果将是一个包含 x、y、z、a、b 的 DataFrame。
I could merge then delete the unwanted columns, but it seems like there is a better method.
我可以合并然后删除不需要的列,但似乎有更好的方法。
采纳答案by Andy Hayden
You could merge the sub-DataFrame (with just those columns):
您可以合并子 DataFrame(仅包含这些列):
df2[list('xab')] # df2 but only with columns x, a, and b
df1.merge(df2[list('xab')])
回答by Terrance DeJesus
You can use .loc
to select the specific columns with all rows and then pull that. An example is below:
您可以使用.loc
来选择包含所有行的特定列,然后将其拉出。一个例子如下:
pandas.merge(dataframe1, dataframe2.iloc[:, [0:5]], how='left', on='key')
In this example, you are merging dataframe1 and dataframe2. You have chosen to do an outer left join on 'key'. However, for dataframe2 you have specified .iloc
which allows you to specific the rows and columns you want in a numerical format. Using :
, your selecting all rows, but [0:5]
selects the first 5 columns. You could use .loc
to specify by name, but if your dealing with long column names, then .iloc
may be better.
在此示例中,您正在合并 dataframe1 和 dataframe2。您已选择对“键”进行外部左连接。但是,对于您指定的 dataframe2,.iloc
它允许您以数字格式指定所需的行和列。使用:
,您选择所有行,但[0:5]
选择前 5 列。您可以使用.loc
按名称指定,但如果您处理长列名,那么.iloc
可能会更好。
回答by Arthur D. Howland
You want to use TWO brackets, so if you are doing a VLOOKUP sort of action:
您想使用两个括号,因此如果您正在执行 VLOOKUP 类型的操作:
df = pd.merge(df,df2[['Key_Column','Target_Column']],on='Key_Column', how='left')
This will give you everything in the original df + add that one corresponding column in df2 that you want to join.
这将为您提供原始 df 中的所有内容,并在 df2 中添加您要加入的相应列。
回答by Marco167
This is to merge selected columns from two tables.
这是从两个表中合并选定的列。
If table_1
contains t1_a,t1_b,t1_c..,id,..t1_z
columns,
and table_2
contains t2_a, t2_b, t2_c..., id,..t2_z
columns,
and only t1_a, id, t2_a are required in the final table, then
如果table_1
包含t1_a,t1_b,t1_c..,id,..t1_z
列,并且table_2
包含t2_a, t2_b, t2_c..., id,..t2_z
列,并且最终表中只需要t1_a,id,t2_a,则
mergedCSV = table_1[['t1_a','id']].merge(table_2[['t2_a','id']], on = 'id',how = 'left')
# save resulting output file
mergedCSV.to_csv('output.csv',index = False)
回答by tonneofash
If you want to drop column(s) from the target data frame, but the column(s) are required for the join, you can do the following:
如果您想从目标数据框中删除列,但连接需要这些列,您可以执行以下操作:
df1 = df1.merge(df2[['a', 'b', 'key1']], how = 'left',
left_on = 'key2', right_on = 'key1').drop('key1')
The .drop('key1')
part will prevent 'key1' from being kept in the resulting data frame, despite it being required to join in the first place.
该.drop('key1')
部分将阻止“key1”保留在结果数据框中,尽管它首先需要加入。