Python Pandas - 忽略列名的不同列的 Concat 数据帧

Question

提问by Axel

I have two pandas.DataFrameswhich I would like to combine into one. The dataframes have the same number of columns, in the same order, but have column headings in different languages. How can I efficiently combine these dataframes?

我有两个pandas.DataFrames我想合并为一个。数据框具有相同数量的列，顺序相同，但具有不同语言的列标题。如何有效地组合这些数据框？

df_ger
index  Datum   Zahl1   Zahl2
0      1-1-17  1       2
1      2-1-17  3       4

df_uk
index  Date    No1     No2
0      1-1-17  5       6
1      2-1-17  7       8

desired output
index  Datum   Zahl1   Zahl2
0      1-1-17  1       2
1      2-1-17  3       4
2      1-1-17  5       6
3      2-1-17  7       8

The only approach I came up with so far is to rename the column headings and then use pd.concat([df_ger, df_uk], axis=0, ignore_index=True). However, I hope to find a more general approach.

到目前为止我想出的唯一方法是重命名列标题，然后使用pd.concat([df_ger, df_uk], axis=0, ignore_index=True). 但是，我希望找到一种更通用的方法。

Answer 1

采纳答案by Stephen Rauch

If the columns are always in the same order, you can mechanically renamethe columns and the do an appendlike:

如果列总是以相同的顺序排列，您可以机械地排列rename列并执行append类似的操作：

Code:

代码：

new_cols = {x: y for x, y in zip(df_uk.columns, df_ger.columns)}
df_out = df_ger.append(df_uk.rename(columns=new_cols))

Test Code:

测试代码：

df_ger = pd.read_fwf(StringIO(
    u"""
        index  Datum   Zahl1   Zahl2
        0      1-1-17  1       2
        1      2-1-17  3       4"""),
    header=1).set_index('index')

df_uk = pd.read_fwf(StringIO(
    u"""
        index  Date    No1     No2
        0      1-1-17  5       6
        1      2-1-17  7       8"""),
    header=1).set_index('index')

print(df_uk)
print(df_ger)

new_cols = {x: y for x, y in zip(df_uk.columns, df_ger.columns)}
df_out = df_ger.append(df_uk.rename(columns=new_cols))

print(df_out)

Results:

结果：

         Date  No1  No2
index                  
0      1-1-17    5    6
1      2-1-17    7    8

        Datum  Zahl1  Zahl2
index                      
0      1-1-17      1      2
1      2-1-17      3      4

        Datum  Zahl1  Zahl2
index                      
0      1-1-17      1      2
1      2-1-17      3      4
0      1-1-17      5      6
1      2-1-17      7      8

Answer 2

回答by C. Nitschke

Provided you can be sure that the structures of the two dataframes remain the same, I see two options:

如果您可以确定两个数据帧的结构保持不变，我会看到两个选项：

Keep the dataframe column names of the chosen default language (I assume en_GB) and just copy them over:
```
df_ger.columns = df_uk.columns
df_combined = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)
```
This works whatever the column names are. However, technically it remains renaming.

Pull the data out of the dataframe using numpy.ndarrays, concatenate them in numpy, and make a dataframe out of it again:

np_ger_data = df_ger.as_matrix()
np_uk_data = df_uk.as_matrix()
np_combined_data = numpy.concatenate([np_ger_data, np_uk_data], axis=0)
df_combined = pd.DataFrame(np_combined_data, columns=["Date", "No1", "No2"])

This solution requires more resources, so I would opt for the first one.

保留所选默认语言的数据框列名称（我假设为 en_GB），然后将它们复制过来：
```
df_ger.columns = df_uk.columns
df_combined = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)
```
无论列名是什么，这都有效。但是，从技术上讲，它仍在重命名。

使用 numpy.ndarrays 从数据框中提取数据，将它们连接到 numpy 中，然后再次从中制作一个数据框：

np_ger_data = df_ger.as_matrix()
np_uk_data = df_uk.as_matrix()
np_combined_data = numpy.concatenate([np_ger_data, np_uk_data], axis=0)
df_combined = pd.DataFrame(np_combined_data, columns=["Date", "No1", "No2"])

这个解决方案需要更多的资源，所以我会选择第一个。

Answer 3

回答by osbon123

I am not sure if this will be simpler than what you had in mind, but if the main goal is for something general then this should be fine with one assumption: The columns in the two files match for example if date is the first column, the translated version will also be the first column.

我不确定这是否会比你想象的更简单，但如果主要目标是一般的东西，那么这应该没问题，有一个假设：两个文件中的列匹配，例如，如果日期是第一列，翻译版本也将是第一列。

# number of columns
n_columns = len(df_ger.columns)

# save final columns names
columns = df_uk.columns

# rename both columns to numbers
df_ger.columns = range(n_columns)
df_uk.columns = range(n_columns)

# concat columns
df_out = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)

# rename columns in new dataframe
df_out.columns = columns

Python Pandas - 忽略列名的不同列的 Concat 数据帧

提问by Axel

采纳答案by Stephen Rauch

Code:

代码：

Test Code:

测试代码：

Results:

结果：

回答by C. Nitschke

回答by osbon123

相关推荐

最近更新

标签

Python Pandas - 忽略列名的不同列的 Concat 数据帧

提问by Axel

采纳答案by Stephen Rauch

Code:

代码：

Test Code:

测试代码：

Results:

结果：

回答by C. Nitschke

回答by osbon123

相关推荐

Python NLTK 查找错误

Python AttributeError: 模块“PyQt5.QtGui”没有属性“QWidget”

Python 3.5 遍历字典列表

创建单行 python pandas 数据框

相关推荐

最近更新

标签