pandas 查找多个数据框列之间的公共元素
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46556169/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Finding common elements between multiple dataframe columns
提问by Tikku
Hope you could help me. I am new to python and pandas, so please bear with me. I am trying to find the common word between three data frames and I am using Jupiter Notebook.
希望你能帮助我。我是 python 和 pandas 的新手,所以请多多包涵。我正在尝试在三个数据框之间找到常用词,并且我正在使用 Jupiter Notebook。
Just for example:
举个例子:
df1=
A
dog
cat
cow
duck
snake
df2=
A
pig
snail
bird
dog
df3=
A
eagle
dog
snail
monkey
There is only one column in all data frames that is A. I would like to find
所有数据框中只有一列是 A。我想找到
- the common word among all columns
- the words that are unique to their own columns and not in common.
- 所有列中的常用词
- 对它们自己的列来说是独一无二的而不是共同的词。
Example:
例子:
duck is unique to df1, snail is unique to df2 and monkey is unique to df3.
duck 是 df1 独有的,snail 是 df2 独有的,monkey 是 df3 独有的。
I am using the below code to some use but not getting what I want straightforward,
我正在使用下面的代码来做一些用途,但没有得到我想要的直接,
df1[df1['A'].isin(df2['A']) & (df2['A']) & (df3['A'])]
Kindly let me know where I am going wrong. Cheers
请让我知道我哪里出错了。干杯
采纳答案by cs95
The problem with your current approach is that you need to chainmultiple isin
calls. What's worse is that you'd need to keep track of which dataframe is the largest, and you call isin
on thatone. Otherwise, it doesn't work.
您当前方法的问题在于您需要链接多个isin
调用。更糟糕的是,你需要跟踪哪些数据帧是最大的,你打电话isin
的那一个。否则,它不起作用。
To make things easy, you can use np.intersect1d
:
为了使事情变得简单,您可以使用np.intersect1d
:
>>> np.intersect1d(df3.A, np.intersect1d(df1.A, df2.A))
array(['dog'], dtype=object)
Similar method using functools.reduce
+ intersect1d
by piRSquared:
piRSquared使用functools.reduce
+intersect1d
的类似方法:
>>> from functools import reduce # python 3 only
>>> reduce(np.intersect1d, [df1.A, df2.A, df3.A])
array(['dog'], dtype=object)
回答by piRSquared
Simplest way is to use set
intersection
最简单的方法是使用set
交集
list(set(df1.A) & set(df2.A) & set(df3.A))
['dog']
However if you have a long list of these things, I'd use reduce
from functools
. This same technique can be used with @c???s????'s use of np.intersect1d
as well.
但是,如果您有很多这些东西的清单,我会使用reduce
from functools
。同样的技术也可以与@c???s???? 的使用一起使用np.intersect1d
。
from functools import reduce
list(reduce(set.intersection, map(set, [df1.A, df2.A, df3.A])))
['dog']