Pandas - 合并两个具有相同列名的 DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25145317/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - merge two DataFrames with Identical Column Names
提问by Slavatron
I have two Data Frames with identical column names and identical IDs in the first column. With the exception of the ID column, every cell that contains a value in one DataFrame contains NaN in the other. Here's an example of what they look like:
我在第一列中有两个具有相同列名和相同 ID 的数据框。除了 ID 列之外,在一个 DataFrame 中包含值的每个单元格在另一个 DataFrame 中都包含 NaN。以下是它们的外观示例:
ID Cat1 Cat2 Cat3
1 NaN 75 NaN
2 61 NaN 84
3 NaN NaN NaN
ID Cat1 Cat2 Cat3
1 54 NaN 44
2 NaN 38 NaN
3 49 50 53
I want to merge them into one DataFrame while keeping the same Column Names. So the result would look like this:
我想将它们合并到一个 DataFrame 中,同时保持相同的列名。所以结果看起来像这样:
ID Cat1 Cat2 Cat3
1 54 75 44
2 61 38 84
3 49 50 53
I tried:
我试过:
df3 = pd.merge(df1, df2, on='ID', how='outer')
Which gave me a DataFrame containing twice as many columns. How can I merge the values from each DataFrame into one?
这给了我一个包含两倍列数的 DataFrame。如何将每个 DataFrame 中的值合并为一个?
回答by Roger Fan
You probably want df.update. See the documentation.
你可能想要df.update。请参阅文档。
df1.update(df2, raise_conflict=True)
回答by Slavatron
In this case, the combine_firstfunction is appropriate. (http://pandas.pydata.org/pandas-docs/version/0.13.1/merging.html)
在这种情况下,combine_first函数是合适的。( http://pandas.pydata.org/pandas-docs/version/0.13.1/merging.html)
As the name implies, combine_first takes the first DataFrame and adds to it with values from the second wherever it finds a NaN value in the first.
顾名思义, combine_first 获取第一个 DataFrame 并将第二个的值添加到其中,只要它在第一个中找到 NaN 值。
So:
所以:
df3 = df1.combine_first(df2)
produces a new DataFrame, df3, that is essentially just df1 with values from df2 filled in whenever possible.
生成一个新的数据帧 df3,它本质上只是 df1,并尽可能填充 df2 中的值。
回答by mccandar
You could also just change the NaN values in df1 with non-NaN values in df2.
您也可以使用 df2 中的非 NaN 值更改 df1 中的 NaN 值。
df1[pd.isnull(df1)] = df2[~pd.isnull(df2)]

