使用索引在 Pandas 中查找两个系列之间的交集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26326489/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:34:25  来源:igfitidea点击:

Finding the intersection between two series in Pandas using index

pythonpandasintersectionseries

提问by Boss1295

I have two series of different lengths, and I am attempting to find the intersection of the two series based on the index, where the index is a string. The end result is, hopefully, a series that has the elements of the intersection based on the common string indexes.

我有两个不同长度的系列,我试图根据索引找到两个系列的交集,其中索引是一个字符串。希望最终结果是一个具有基于公共字符串索引的交集元素的系列。

Any ideas?

有任何想法吗?

回答by Alex Riley

Pandas indexes have an intersection methodwhich you can use. If you have two Series, s1and s2, then

Pandas 索引有一个可以使用的交集方法。如果你有两个系列,s1s2,然后

s1.index.intersection(s2.index)

or, equivalently:

或者,等效地:

s1.index & s2.index

gives you the index values which are in both s1and s2.

给你这是在这两个指数值s1s2

You can then use this list of indexes to view the corresponding elements of a series. For example:

然后,您可以使用此索引列表查看系列的相应元素。例如:

>>> ixs = s1.index.intersection(s2.index)
>>> s1.loc[ixs]
# subset of s1 with only the indexes also found in s2 appears here

回答by nurp

Both my data increments so I wrote a function to get the indices then filtered the data based on their indexes.

我的两个数据都会增加,所以我编写了一个函数来获取索引,然后根据它们的索引过滤数据。

np.shape(data1)  # (1330, 8)
np.shape(data2)  # (2490, 9)
index_1, index_2 = overlap(data1, data2)
data1 = data1[index1]
data2 = data2[index2]
np.shape(data1)  # (540, 8)
np.shape(data2)  # (540, 9)
def overlap(data1, data2):
    '''both data is assumed to be incrementing'''
    mask1 = np.array([False] * len(data1))
    mask2 = np.array([False] * len(data2))
    idx_1 = 0
    idx_2 = 0
    while idx_1 < len(data1) and idx_2 < len(data2):
        if data1[idx_1] < data2[idx_2]:
            mask1[idx_1] = False
            mask2[idx_2] = False
            idx_1 += 1
        elif data1[idx_1] > data2[idx_2]:
            mask1[idx_1] = False
            mask2[idx_2] = False
            idx_2 += 1
        else:
            mask1[idx_1] = True
            mask2[idx_2] = True
            idx_1 += 1
            idx_2 += 1
    return mask1, mask2