Pandas - 合并两个具有相同列名的 DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25145317/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:20:02  来源:igfitidea点击:

Pandas - merge two DataFrames with Identical Column Names

pythonpandasmergedataframe

提问by Slavatron

I have two Data Frames with identical column names and identical IDs in the first column. With the exception of the ID column, every cell that contains a value in one DataFrame contains NaN in the other. Here's an example of what they look like:

我在第一列中有两个具有相同列名和相同 ID 的数据框。除了 ID 列之外,在一个 DataFrame 中包含值的每个单元格在另一个 DataFrame 中都包含 NaN。以下是它们的外观示例:

ID    Cat1    Cat2    Cat3
1     NaN     75      NaN
2     61      NaN     84
3     NaN     NaN     NaN


ID    Cat1    Cat2    Cat3
1     54      NaN     44
2     NaN     38     NaN
3     49      50      53

I want to merge them into one DataFrame while keeping the same Column Names. So the result would look like this:

我想将它们合并到一个 DataFrame 中,同时保持相同的列名。所以结果看起来像这样:

ID    Cat1    Cat2    Cat3
1     54      75      44
2     61      38      84
3     49      50      53

I tried:

我试过:

df3 = pd.merge(df1, df2, on='ID', how='outer')

Which gave me a DataFrame containing twice as many columns. How can I merge the values from each DataFrame into one?

这给了我一个包含两倍列数的 DataFrame。如何将每个 DataFrame 中的值合并为一个?

回答by Roger Fan

You probably want df.update. See the documentation.

你可能想要df.update。请参阅文档

df1.update(df2, raise_conflict=True)

回答by Slavatron

In this case, the combine_firstfunction is appropriate. (http://pandas.pydata.org/pandas-docs/version/0.13.1/merging.html)

在这种情况下,combine_first函数是合适的。( http://pandas.pydata.org/pandas-docs/version/0.13.1/merging.html)

As the name implies, combine_first takes the first DataFrame and adds to it with values from the second wherever it finds a NaN value in the first.

顾名思义, combine_first 获取第一个 DataFrame 并将第二个的值添加到其中,只要它在第一个中找到 NaN 值。

So:

所以:

df3 = df1.combine_first(df2)

produces a new DataFrame, df3, that is essentially just df1 with values from df2 filled in whenever possible.

生成一个新的数据帧 df3,它本质上只是 df1,并尽可能填充 df2 中的值。

回答by mccandar

You could also just change the NaN values in df1 with non-NaN values in df2.

您也可以使用 df2 中的非 NaN 值更改 df1 中的 NaN 值。

df1[pd.isnull(df1)] = df2[~pd.isnull(df2)]