按 ID 合并两个 Excel 文件并合并具有相同名称的列(python、pandas)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24001360/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Merging two Excel files by ID and combining columns with same name (python, pandas)
提问by ferrios25
I am new to stackoverflow and pandas for python. I found part of my answer in the post Looking to merge two Excel files by ID into one Excel file using Python 2.7
我是 python 的 stackoverflow 和 pandas 的新手。我在寻找使用 Python 2.7 按 ID 将两个 Excel 文件合并为一个 Excel 文件的帖子中找到了我的部分答案
However, I also want to merge or combine columns from the two excel files with the same name. I thought the following post would have my answer but I guess it's not titled correctly: Merging Pandas DataFrames with the same column name
但是,我还想合并或合并来自两个同名 excel 文件的列。我以为下面的帖子会有我的答案,但我想它的标题不正确:Merging Pandas DataFrames with same column name
Right now I have the code:
现在我有代码:
import pandas as pd
file1 = pd.read_excel("file1.xlsx")
file2 = pd.read_excel("file2.xlsx")
file3 = file1.merge(file2, on="ID", how="outer")
file3.to_excel("merged.xlsx")
file1.xlsx
文件1.xlsx
ID,JanSales,FebSales,test
1,100,200,cars
2,200,500,
3,300,400,boats
ID,JanSales,FebSales,test
1,100,200,cars
2,200,500,
3,300,400,boats
file2.xlsx
文件2.xlsx
ID,CreditScore,EMMAScore,test
2,good,Watson,planes
3,okay,Thompson,
4,not-so-good,NA,
ID,CreditScore,EMMAScore,test
2,good,Watson,planes
3,ok,Thompson,
4,not-so-good,NA,
what I get is merged.xlsx
我得到的是合并的.xlsx
ID,JanSales,FebSales,test_x,CreditScore,EMMAScore,test_y
1,100,200,cars,NaN,NaN,
2,200,500,,good,Watson,planes
3,300,400,boats,okay,Thompson,
4,NaN,NaN,,not-so-good,NaN,
ID,JanSales,FebSales, test_x,CreditScore,EMMAScore, test_y
1,100,200,cars,NaN,NaN,
2,200,500,,good,Watson,planes
3,300,400,boats,ok,Thompson,
4,N-good南,
what I want is merged.xlsx
我想要的是合并.xlsx
ID,JanSales,FebSales,CreditScore,EMMAScore,test
1,100,200,NaN,NaN,cars
2,200,500,good,Watson,planes
3,300,400,okay,Thompson,boats
4,NaN,NaN,not-so-good,NaN,NaA
ID,JanSales,FebSales,CreditScore,EMMAScore, test
1,100,200,NaN,NaN,cars
2,200,500,good,Watson,planes
3,300,400,ok,Thompson,boats
4,NaN,NaN,N,not-so-good,
In my real data, there are 200+ columns that correspond to the "test" column in my example. I want the program to find these columns with the same names in both file1.xlsx and file2.xlsx and combine them in the merged file.
在我的真实数据中,有 200 多列对应于我的示例中的“测试”列。我希望程序在 file1.xlsx 和 file2.xlsx 中找到这些具有相同名称的列,并将它们合并到合并文件中。
回答by EdChum
OK, here is a more dynamic way, after merging we assume that clashes will occur and result in 'column_name_x' or '_y'.
好的,这是一种更动态的方式,合并后我们假设会发生冲突并导致“column_name_x”或“_y”。
So first figure out the common column names and remove 'ID' from this list
所以首先找出常见的列名并从此列表中删除“ID”
In [51]:
common_columns = list(set(list(df1.columns)) & set(list(df2.columns)))
common_columns.remove('ID')
common_columns
Out[51]:
['test']
Now we can iterate over this list to create the new column and use whereto conditionally assign the value dependent on which value is not null.
现在我们可以迭代这个列表来创建新列,并where根据哪个值不为空来有条件地分配值。
In [59]:
for col in common_columns:
df3[col] = df3[col+'_x'].where(df3[col+'_x'].notnull(), df3[col+'_y'])
df3
Out[59]:
ID JanSales FebSales test_x CreditScore EMMAScore test_y test
0 1 100 200 cars NaN NaN NaN cars
1 2 200 500 NaN good Watson planes planes
2 3 300 400 boats okay Thompson NaN boats
3 4 NaN NaN NaN not-so-good NaN NaN NaN
[4 rows x 8 columns]
Then just to finish off drop all the extra columns:
然后只是为了完成删除所有额外的列:
In [68]:
clash_names = [elt+suffix for elt in common_columns for suffix in ('_x','_y') ]
clash_names
df3.drop(labels=clash_names, axis=1,inplace=True)
df3
Out[68]:
ID JanSales FebSales CreditScore EMMAScore test
0 1 100 200 NaN NaN cars
1 2 200 500 good Watson planes
2 3 300 400 okay Thompson boats
3 4 NaN NaN not-so-good NaN NaN
[4 rows x 6 columns]
The snippet above is from this :Prepend prefix to list elements with list comprehension
上面的代码片段来自:Prepend prefix to list elements with list comprehension

