pandas 熊猫中的“反合并”(Python)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38242368/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:32:06  来源:igfitidea点击:

"Anti-merge" in pandas (Python)

pythonpandasmerge

提问by Polly

How can I pick out the difference between to columns of the same name in two dataframes? I mean I have dataframe A with a column named X and dataframe B with column named X, if i do pd.merge(A, B, on=['X']), i'll get the common X values of A and B, but how can i get the "non-common" ones?

如何找出两个数据框中同名列之间的区别?我的意思是我有一个名为 X 的列的数据框 A 和名为 X 的列的数据框 B,如果我这样做pd.merge(A, B, on=['X']),我将获得 A 和 B 的通用 X 值,但我怎样才能获得“非通用”值?

回答by EdChum

If you change the merge type to how='outer'and indicator=Truethis will add a column to tell you whether the values are left/both/right only:

如果您将合并类型更改为how='outer'indicator=True这将添加一列来告诉您值是否为左/双/右:

In [2]:
A = pd.DataFrame({'x':np.arange(5)})
B = pd.DataFrame({'x':np.arange(3,8)})
print(A)
print(B)
   x
0  0
1  1
2  2
3  3
4  4
   x
0  3
1  4
2  5
3  6
4  7

In [3]:
pd.merge(A,B, how='outer', indicator=True)

Out[3]:
     x      _merge
0  0.0   left_only
1  1.0   left_only
2  2.0   left_only
3  3.0        both
4  4.0        both
5  5.0  right_only
6  6.0  right_only
7  7.0  right_only

You can then filter the resultant merged df on the _mergecol:

然后,您可以在_mergecol上过滤结果合并的 df :

In [4]:
merged = pd.merge(A,B, how='outer', indicator=True)
merged[merged['_merge'] == 'left_only']

Out[4]:
     x     _merge
0  0.0  left_only
1  1.0  left_only
2  2.0  left_only

You can also use isinand negate the mask to find values not in B:

您还可以使用isin和否定掩码来查找不在 B 中的值:

In [5]:
A[~A['x'].isin(B['x'])]

Out[5]:
   x
0  0
1  1
2  2

回答by Erfan

The accepted answer gives a so called LEFT JOIN IF NULLin SQL terms. If you want all the rows except the matching ones from bothDataFrames, not only left. You have to add another condition to the filter, since you want to exclude all rows which are in both.

接受的答案LEFT JOIN IF NULL在 SQL 术语中给出了一个所谓的答案。如果你想要除了来自两个DataFrame的匹配行之外的所有行,不仅要离开。您必须向过滤器添加另一个条件,因为您要排除both.

In this case we use DataFrame.merge& DataFrame.query:

在这种情况下,我们使用DataFrame.merge& DataFrame.query

df1 = pd.DataFrame({'A':list('abcde')})
df2 = pd.DataFrame({'A':list('cdefgh')})

print(df1, '\n')
print(df2)

   A
0  a # <- only df1
1  b # <- only df1
2  c # <- both
3  d # <- both
4  e # <- both

   A 
0  c # both
1  d # both
2  e # both
3  f # <- only df2
4  g # <- only df2
5  h # <- only df2
df = (
    df1.merge(df2, 
              on='A', 
              how='outer', 
              indicator=True)
    .query('_merge != "both"')
    .drop(columns='_merge')
)

print(df)

   A
0  a
1  b
5  f
6  g
7  h