Python Pandas：如何根据“OR”条件进行合并？

Question

提问by alwaysaskingquestions

Let's say I have two dataframes, and the column names for both are:

假设我有两个数据框，两者的列名是：

table 1 columns:
[ShipNumber, TrackNumber, ShipDate, Quantity, Weight]
table 2 columns:
[ShipNumber, TrackNumber, AmountReceived]

I want to merge the two tables based on both ShipNumber and TrackNumber. However, if i simply use merge in the following way (pseudo code, not real code):

我想根据 ShipNumber 和 TrackNumber 合并两个表。但是，如果我只是按以下方式使用合并（伪代码，而不是真正的代码）：

tab1.merge(tab2, "left", on=['ShipNumber','TrackNumber'])

then, that means the values in both ShipNumber and TrackNumber columns from both tables MUST MATCH.

那么，这意味着两个表中 ShipNumber 和 TrackNumber 列中的值必须匹配。

However, in my case, sometimes the ShipNumber column values will match, sometimes the TrackNumber column values will match; as long as one of the two values match for a row, I want the merge to happen.

但是，就我而言，有时 ShipNumber 列值会匹配，有时 TrackNumber 列值会匹配；只要两个值之一匹配一行，我就希望合并发生。

In other words, if row 1 ShipNumber in tab 1 matches row 3 ShipNumber in tab 2, but the TrackNumber in two tables for the two records do not match, I still want to match the two rows from the two tables.

换句话说，如果tab 1中的row 1 ShipNumber匹配tab 2中的row 3 ShipNumber，但是两个表中两个记录的TrackNumber不匹配，我仍然想匹配两个表中的两行。

So basically this is a either/or match condition (pesudo code):

所以基本上这是一个非此即彼的匹配条件（伪代码）：

if tab1.ShipNumber == tab2.ShipNumber OR tab1.TrackNumber == tab2.TrackNumber:
    then merge

I hope my question makes sense... Any help is really really appreciated!

我希望我的问题有意义......真的非常感谢任何帮助！

As suggested, I looked into this post: Python pandas merge with OR logicBut it is not completely the same issue I think, as the OP from that post has a mapping file, and so they can simply do 2 merges to solve this. But I dont have a mapping file, rather, I have two df's with same key columns (ShipNumber, TrackNumber)

按照建议，我查看了这篇文章： Python pandas 与 OR 逻辑合并但这并不是我认为的完全相同的问题，因为该文章中的 OP 有一个映射文件，因此他们可以简单地进行 2 次合并来解决这个问题。但是我没有映射文件，相反，我有两个具有相同键列（ShipNumber、TrackNumber）的 df

Answer 1

采纳答案by andrew_reece

Use merge()and concat(). Then drop any duplicate cases where both Aand Bmatch (thanks @Scott Boston for that final step).

使用merge()和concat()。然后删除任何重复的情况下，这两个A和B匹配（感谢@Scott波士顿为最后一步）。

df1 = pd.DataFrame({'A':[3,2,1,4], 'B':[7,8,9,5]})
df2 = pd.DataFrame({'A':[1,5,6,4], 'B':[4,1,8,5]})

df1         df2
   A  B        A  B
0  1  7     0  1  4
1  2  8     1  5  1
2  3  9     2  6  8
3  4  5     3  4  5

With these data frames we should see:

有了这些数据框，我们应该看到：

df1.loc[0]matches Aon df2.loc[0]
df1.loc[1]matches Bon df2.loc[2]
df1.loc[3]matches both Aand Bon df2.loc[3]

df1.loc[0]比赛A上df2.loc[0]
df1.loc[1]比赛B上df2.loc[2]
df1.loc[3]匹配A和B上df2.loc[3]

We'll use suffixes to keep track of what matched where:

我们将使用后缀来跟踪匹配的内容：

suff_A = ['_on_A_match_1', '_on_A_match_2']
suff_B = ['_on_B_match_1', '_on_B_match_2']

pd.concat([df1.merge(df2, on='A', suffixes=suff_A), 
           df1.merge(df2, on='B', suffixes=suff_B)])

     A  A_on_B_match_1  A_on_B_match_2    B  B_on_A_match_1  B_on_A_match_2
0  1.0             NaN             NaN  NaN             9.0             4.0
1  4.0             NaN             NaN  NaN             5.0             5.0
0  NaN             2.0             6.0  8.0             NaN             NaN
1  NaN             4.0             4.0  5.0             NaN             NaN

Note that the second and fourth rows are duplicate matches (for both data frames, A = 4and B = 5). We need to remove one of those sets.

请注意，第二行和第四行是重复的匹配项（对于数据框A = 4和B = 5）。我们需要删除其中一组。

dupes = (df.B_on_A_match_1 == df.B_on_A_match_2) # also could remove A_on_B_match
df.loc[~dupes]

     A  A_on_B_match_1  A_on_B_match_2    B  B_on_A_match_1  B_on_A_match_2
0  1.0             NaN             NaN  NaN             9.0             4.0
0  NaN             2.0             6.0  8.0             NaN             NaN
1  NaN             4.0             4.0  5.0             NaN             NaN

Answer 2

回答by pratpor

I would suggest this alternate way for doing merge like this. This seems easier for me.

我建议使用这种替代方式进行合并。这对我来说似乎更容易。

table1["id_to_be_merged"] = table1.apply(
    lambda row: row["ShipNumber"] if pd.notnull(row["ShipNumber"]) else row["TrackNumber"], axis=1)

You can add the same column in table2as well if needed and then use in left_inor right_onbased on your requirement.

table2如果需要，您也可以添加相同的列，然后根据您的要求使用left_in或使用right_on。

Python Pandas：如何根据“OR”条件进行合并？

提问by alwaysaskingquestions

采纳答案by andrew_reece

回答by pratpor

相关推荐

最近更新

标签

Python Pandas：如何根据“OR”条件进行合并？

提问by alwaysaskingquestions

采纳答案by andrew_reece

回答by pratpor

相关推荐

pandas 在python pandas数据帧中将字符串转换为日期格式

pandas 迭代器中的返回值类型和熊猫中迭代器的列名打印

pandas 不明白：ValueError: Can only tuple-index with a MultiIndex

Pandas 无法打开这个 Excel 文件

相关推荐

最近更新

标签