Python Pandas 合并两个具有不同列的数据框

Question

提问by economy

I'm surely missing something simple here. Trying to merge two dataframes in pandas that have mostly the same column names, but the right dataframe has some columns that the left doesn't have, and vice versa.

我肯定在这里遗漏了一些简单的东西。尝试合并 Pandas 中的两个数据帧，它们的列名几乎相同，但右侧的数据帧有一些左侧没有的列，反之亦然。

>df_may

  id  quantity  attr_1  attr_2
0  1        20       0       1
1  2        23       1       1
2  3        19       1       1
3  4        19       0       0

>df_jun

  id  quantity  attr_1  attr_3
0  5         8       1       0
1  6        13       0       1
2  7        20       1       1
3  8        25       1       1

I've tried joining with an outer join:

我试过加入外连接：

mayjundf = pd.DataFrame.merge(df_may, df_jun, how="outer")

But that yields:

但这会产生：

Left data columns not unique: Index([....

I've also specified a single column to join on (on = "id", e.g.), but that duplicates all columns except "id" like attr_1_x, attr_1_y, which is not ideal. I've also passed the entire list of columns (there are many) to "on":

我还指定了一个要加入的列 (on = "id", eg)，但这会复制除“id”之外的所有列，如 attr_1_x、attr_1_y，这并不理想。我还将整个列列表（有很多）传递给“on”：

mayjundf = pd.DataFrame.merge(df_may, df_jun, how="outer", on=list(df_may.columns.values))

Which yields:

其中产生：

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

What am I missing? I'd like to get a df with all rows appended, and attr_1, attr_2, attr_3 populated where possible, NaN where they don't show up. This seems like a pretty typical workflow for data munging, but I'm stuck.

我错过了什么？我想得到一个附加了所有行的 df，并且在可能的情况下填充了 attr_1、attr_2、attr_3，在它们没有出现的地方填充了 NaN。这似乎是一个非常典型的数据处理工作流程，但我被卡住了。

Thanks in advance.

提前致谢。

Answer 1

采纳答案by EdChum

I think in this case concatis what you want:

我认为在这种情况下concat是你想要的：

In [12]:

pd.concat([df,df1], axis=0, ignore_index=True)
Out[12]:
   attr_1  attr_2  attr_3  id  quantity
0       0       1     NaN   1        20
1       1       1     NaN   2        23
2       1       1     NaN   3        19
3       0       0     NaN   4        19
4       1     NaN       0   5         8
5       0     NaN       1   6        13
6       1     NaN       1   7        20
7       1     NaN       1   8        25

by passing axis=0here you are stacking the df's on top of each other which I believe is what you want then producing NaNvalue where they are absent from their respective dfs.

通过axis=0这里，您将 df 堆叠在彼此之上，我相信这是您想要的，然后NaN在它们各自的 df中不存在的情况下产生价值。

Answer 2

回答by Will H

I had this problem today using any of concat, append or merge, and I got around it by adding a helper column sequentially numbered and then doing an outer join

我今天使用 concat、append 或 merge 中的任何一个都遇到了这个问题，我通过添加一个按顺序编号的帮助列然后进行外连接来解决它

helper=1
for i in df1.index:
    df1.loc[i,'helper']=helper
    helper=helper+1
for i in df2.index:
    df2.loc[i,'helper']=helper
    helper=helper+1
df1.merge(df2,on='helper',how='outer')

Python Pandas 合并两个具有不同列的数据框

提问by economy

采纳答案by EdChum

回答by Will H

相关推荐

最近更新

标签

Python Pandas 合并两个具有不同列的数据框

提问by economy

采纳答案by EdChum

回答by Will H

相关推荐

Python 字符串文字中的特殊字符

Python 在 Spyder 中更改变量名称

Python 了解在 Selenium 中执行异步脚本

Python Pandas 使用什么规则来生成视图和副本？

相关推荐

最近更新

标签