pandas 查找多个数据框列之间的公共元素

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46556169/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:34:55  来源:igfitidea点击:

Finding common elements between multiple dataframe columns

pythonstringpandasintersectionseries

提问by Tikku

Hope you could help me. I am new to python and pandas, so please bear with me. I am trying to find the common word between three data frames and I am using Jupiter Notebook.

希望你能帮助我。我是 python 和 pandas 的新手,所以请多多包涵。我正在尝试在三个数据框之间找到常用词,并且我正在使用 Jupiter Notebook。

Just for example:

举个例子:

df1=
A
dog
cat
cow 
duck
snake

df2=
A
pig
snail
bird
dog

df3=
A
eagle
dog 
snail
monkey

There is only one column in all data frames that is A. I would like to find

所有数据框中只有一列是 A。我想找到

  1. the common word among all columns
  2. the words that are unique to their own columns and not in common.
  1. 所有列中的常用词
  2. 对它们自己的列来说是独一无二的而不是共同的词。

Example:

例子:

duck is unique to df1, snail is unique to df2 and monkey is unique to df3.

duck 是 df1 独有的,snail 是 df2 独有的,monkey 是 df3 独有的。

I am using the below code to some use but not getting what I want straightforward,

我正在使用下面的代码来做一些用途,但没有得到我想要的直接,

df1[df1['A'].isin(df2['A']) & (df2['A']) & (df3['A'])]

Kindly let me know where I am going wrong. Cheers

请让我知道我哪里出错了。干杯

采纳答案by cs95

The problem with your current approach is that you need to chainmultiple isincalls. What's worse is that you'd need to keep track of which dataframe is the largest, and you call isinon thatone. Otherwise, it doesn't work.

您当前方法的问题在于您需要链接多个isin调用。更糟糕的是,你需要跟踪哪些数据帧是最大的,你打电话isin一个。否则,它不起作用。

To make things easy, you can use np.intersect1d:

为了使事情变得简单,您可以使用np.intersect1d

>>> np.intersect1d(df3.A, np.intersect1d(df1.A, df2.A))
array(['dog'], dtype=object)


Similar method using functools.reduce+ intersect1dby piRSquared:

piRSquared使用functools.reduce+intersect1d类似方法:

>>> from functools import reduce # python 3 only
>>> reduce(np.intersect1d, [df1.A, df2.A, df3.A])
array(['dog'], dtype=object)

回答by piRSquared

Simplest way is to use setintersection

最简单的方法是使用set交集

list(set(df1.A) & set(df2.A) & set(df3.A))

['dog']


However if you have a long list of these things, I'd use reducefrom functools. This same technique can be used with @c???s????'s use of np.intersect1das well.

但是,如果您有很多这些东西的清单,我会使用reducefrom functools。同样的技术也可以与@c???s???? 的使用一起使用np.intersect1d

from functools import reduce

list(reduce(set.intersection, map(set, [df1.A, df2.A, df3.A])))

['dog']