Python Pandas - 忽略列名的不同列的 Concat 数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45590866/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas - Concat dataframes with different columns ignoring column names
提问by Axel
I have two pandas.DataFrames
which I would like to combine into one. The dataframes have the same number of columns, in the same order, but have column headings in different languages. How can I efficiently combine these dataframes?
我有两个pandas.DataFrames
我想合并为一个。数据框具有相同数量的列,顺序相同,但具有不同语言的列标题。如何有效地组合这些数据框?
df_ger
index Datum Zahl1 Zahl2
0 1-1-17 1 2
1 2-1-17 3 4
df_uk
index Date No1 No2
0 1-1-17 5 6
1 2-1-17 7 8
desired output
index Datum Zahl1 Zahl2
0 1-1-17 1 2
1 2-1-17 3 4
2 1-1-17 5 6
3 2-1-17 7 8
The only approach I came up with so far is to rename the column headings and then use pd.concat([df_ger, df_uk], axis=0, ignore_index=True)
. However, I hope to find a more general approach.
到目前为止我想出的唯一方法是重命名列标题,然后使用pd.concat([df_ger, df_uk], axis=0, ignore_index=True)
. 但是,我希望找到一种更通用的方法。
采纳答案by Stephen Rauch
If the columns are always in the same order, you can mechanically rename
the columns and the do an append
like:
如果列总是以相同的顺序排列,您可以机械地排列rename
列并执行append
类似的操作:
Code:
代码:
new_cols = {x: y for x, y in zip(df_uk.columns, df_ger.columns)}
df_out = df_ger.append(df_uk.rename(columns=new_cols))
Test Code:
测试代码:
df_ger = pd.read_fwf(StringIO(
u"""
index Datum Zahl1 Zahl2
0 1-1-17 1 2
1 2-1-17 3 4"""),
header=1).set_index('index')
df_uk = pd.read_fwf(StringIO(
u"""
index Date No1 No2
0 1-1-17 5 6
1 2-1-17 7 8"""),
header=1).set_index('index')
print(df_uk)
print(df_ger)
new_cols = {x: y for x, y in zip(df_uk.columns, df_ger.columns)}
df_out = df_ger.append(df_uk.rename(columns=new_cols))
print(df_out)
Results:
结果:
Date No1 No2
index
0 1-1-17 5 6
1 2-1-17 7 8
Datum Zahl1 Zahl2
index
0 1-1-17 1 2
1 2-1-17 3 4
Datum Zahl1 Zahl2
index
0 1-1-17 1 2
1 2-1-17 3 4
0 1-1-17 5 6
1 2-1-17 7 8
回答by C. Nitschke
Provided you can be sure that the structures of the two dataframes remain the same, I see two options:
如果您可以确定两个数据帧的结构保持不变,我会看到两个选项:
Keep the dataframe column names of the chosen default language (I assume en_GB) and just copy them over:
df_ger.columns = df_uk.columns df_combined = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)
This works whatever the column names are. However, technically it remains renaming.
Pull the data out of the dataframe using numpy.ndarrays, concatenate them in numpy, and make a dataframe out of it again:
np_ger_data = df_ger.as_matrix() np_uk_data = df_uk.as_matrix() np_combined_data = numpy.concatenate([np_ger_data, np_uk_data], axis=0) df_combined = pd.DataFrame(np_combined_data, columns=["Date", "No1", "No2"])
This solution requires more resources, so I would opt for the first one.
保留所选默认语言的数据框列名称(我假设为 en_GB),然后将它们复制过来:
df_ger.columns = df_uk.columns df_combined = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)
无论列名是什么,这都有效。但是,从技术上讲,它仍在重命名。
使用 numpy.ndarrays 从数据框中提取数据,将它们连接到 numpy 中,然后再次从中制作一个数据框:
np_ger_data = df_ger.as_matrix() np_uk_data = df_uk.as_matrix() np_combined_data = numpy.concatenate([np_ger_data, np_uk_data], axis=0) df_combined = pd.DataFrame(np_combined_data, columns=["Date", "No1", "No2"])
这个解决方案需要更多的资源,所以我会选择第一个。
回答by osbon123
I am not sure if this will be simpler than what you had in mind, but if the main goal is for something general then this should be fine with one assumption: The columns in the two files match for example if date is the first column, the translated version will also be the first column.
我不确定这是否会比你想象的更简单,但如果主要目标是一般的东西,那么这应该没问题,有一个假设:两个文件中的列匹配,例如,如果日期是第一列,翻译版本也将是第一列。
# number of columns
n_columns = len(df_ger.columns)
# save final columns names
columns = df_uk.columns
# rename both columns to numbers
df_ger.columns = range(n_columns)
df_uk.columns = range(n_columns)
# concat columns
df_out = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)
# rename columns in new dataframe
df_out.columns = columns