Python 在 Pandas 中找到两个系列之间的交集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18079563/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 09:52:45  来源:igfitidea点击:

Finding the intersection between two series in Pandas

pythonpandasseriesintersection

提问by user7289

I have two series s1and s2in pandas and want to compute the intersection i.e. where all of the values of the series are common.

我有两个系列s1s2熊猫,并且想要计算交集,即系列的所有值都是公共的。

How would I use the concatfunction to do this? I have been trying to work it out but have been unable to (I don't want to compute the intersection on the indices of s1and s2, but on the values).

我将如何使用该concat功能来做到这一点?我一直试图解决它,但一直无法解决(我不想计算s1and索引的交集s2,而是计算值)。

采纳答案by Joop

Place both series in Python's set containerthen use the set intersection method:

将这两个系列放在 Python 的set 容器中,然后使用 set 交集方法:

s1.intersection(s2)

and then transform back to list if needed.

然后根据需要转换回列表。

Just noticed pandas in the tag. Can translate back to that:

刚刚注意到标签中的熊猫。可以翻译回:

pd.Series(list(set(s1).intersection(set(s2))))

From comments I have changed this to a more Pythonic expression, which is shorter and easier to read:

从评论中,我已将其更改为更 Pythonic 的表达式,它更短且更易于阅读:

Series(list(set(s1) & set(s2)))

should do the trick, except if the index data is also important to you.

应该可以解决问题,除非索引数据对您也很重要。

Have added the list(...)to translate the set before going to pd.Series as pandas does not accept a set as direct input for a Series.

list(...)在转到 pd.Series 之前添加了要翻译的集合,因为熊猫不接受集合作为系列的直接输入。

回答by jbn

If you are using Pandas, I assume you are also using NumPy. Numpy has a function intersect1dthat will work with a Pandas series.

如果您使用 Pandas,我假设您也在使用 NumPy。Numpy 有一个intersect1d可以与 Pandas 系列一起使用的函数。

Example:

例子:

pd.Series(np.intersect1d(pd.Series([1,2,3,5,42]), pd.Series([4,5,6,20,42])))

will return a Series with the values 5 and 42.

将返回值为 5 和 42 的系列。

回答by eldad-a

Setup:

设置:

s1 = pd.Series([4,5,6,20,42])
s2 = pd.Series([1,2,3,5,42])

Timings:

时间:

%%timeit
pd.Series(list(set(s1).intersection(set(s2))))
10000 loops, best of 3: 57.7 μs per loop

%%timeit
pd.Series(np.intersect1d(s1,s2))
1000 loops, best of 3: 659 μs per loop

%%timeit
pd.Series(np.intersect1d(s1.values,s2.values))
10000 loops, best of 3: 64.7 μs per loop

So the numpy solution can be comparable to the set solution even for small series, if one uses the valuesexplicitely.

因此,即使对于小系列,numpy 解决方案也可以与 set 解决方案相媲美,如果有人values明确使用。

回答by Glen Thompson

Python

Python

s1 = pd.Series([4,5,6,20,42])
s2 = pd.Series([1,2,3,5,42])

s1[s1.isin(s2)]

R

电阻

s1  <- c(4,5,6,20,42)
s2 <- c(1,2,3,5,42)

s1[s1 %in% s2]

Edit:Doesn't handle dupes.

编辑:不处理欺骗。

回答by kvb

Could use merge operator like follows

可以使用如下的合并运算符

pd.merge(df1, df2, how='inner')