Pandas Python:连接具有相同列的数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/52204115/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Python: Concatenate dataframes having same columns
提问by GeoSal
I have 3 dataframes having the same column names as each other. Say :
我有 3 个具有相同列名的数据框。说 :
df1
column1 column2 column3
a b c
d e f
df2
column1 column2 column3
g h i
j k l
df3
column1 column2 column3
m n o
p q r
Each dataframe has different values but the same columns. I tried append and concat, as well as merge outer but had errors. Here's what I tried:
每个数据框具有不同的值但具有相同的列。我尝试了 append 和 concat,以及合并外部,但有错误。这是我尝试过的:
df_final = df1.append(df2, sort=True,ignore_index=True).append2(df3, sort=True,ignore_index=True)
df_final = df1.append(df2, sort=True,ignore_index=True).append2(df3, sort=True,ignore_index=True)
I also tried:
df_final = pd.concat([df1, df2, df3], axis=1)
我也试过:
df_final = pd.concat([df1, df2, df3], axis=1)
But I get this error:
AssertionError: Number of manager items must equal union of block items# manager items: 61, # tot_items: 62
但我收到此错误:
AssertionError: Number of manager items must equal union of block items# manager items: 61, # tot_items: 62
I've googled the error but I can't seem to understand why it's happening in my case. Any guidance is much appreciated!
我在谷歌上搜索了错误,但我似乎无法理解为什么会在我的情况下发生。非常感谢任何指导!
回答by jezrael
I think there is problem with duplicated columns names in some or all DataFrames.
我认为某些或所有 DataFrame 中存在重复的列名问题。
#simulate error
df1.columns = ['column3','column1','column1']
df2.columns = ['column5','column1','column1']
df3.columns = ['column2','column1','column1']
df_final = pd.concat([df1, df2, df3])
AssertionError: Number of manager items must equal union of block items # manager items: 4, # tot_items: 5
断言错误:管理器项目的数量必须等于块项目的并集#管理器项目:4,#tot_items:5
You can find duplicated columns names:
您可以找到重复的列名称:
print (df3.columns[df3.columns.duplicated(keep=False)])
Index(['column1', 'column1'], dtype='object')
Possible solutions is set columns names by list:
可能的解决方案是按列表设置列名:
df3.columns = ['column1','column2','column3']
print (df3)
column1 column2 column3
0 m n o
1 p q r
Or remove duplicated columns with dupe names:
或者删除重复名称的重复列:
df31 = df3.loc[:, ~df3.columns.duplicated()]
print (df31)
column2 column1
0 m n
1 p q
Then concat
or append
should working nice.
然后concat
或append
应该工作得很好。
回答by mad_
Try without providing axis example:
尝试不提供轴示例:
import pandas as pd
mydict1 = {'column1' : ['a','d'],
'column2' : ['b','e'],
'column3' : ['c','f']}
mydict2 = {'column1' : ['g','j'],
'column2' : ['h','k'],
'column3' : ['i','i']}
mydict3= {"column1":['m','p'],
"column2":['n','q'],
"column3":['o','r']}
df1=pd.DataFrame(mydict1)
df2=pd.DataFrame(mydict2)
df3=pd.DataFrame(mydict3)
pd.concat([df1,df2,df3],ignore_index=True)
Output
输出
column1 column2 column3
0 a b c
1 d e f
0 g h i
1 j k i
0 m n o
1 p q r
回答by CSMaverick
You can remove axis=1
in your code
您可以axis=1
在代码中删除
import pandas as pd
a = {"column1":['a','d'],
"column2":['b','e'],
"column3":['c','f']}
b = {"column1":['g','j'],
"column2":['h','k'],
"column3":['i','l']}
c = {"column1":['m','p'],
"column2":['n','q'],
"column3":['o','r']}
df1 = pd.DataFrame(a)
df2 = pd.DataFrame(b)
df3 = pd.DataFrame(c)
df_final = pd.concat([df1, df2, df3]) #.reset_index()
print(df_final)
#output
column1 column2 column3
0 a b c
1 d e f
0 g h i
1 j k l
0 m n o
1 p q r