在 Python Pandas -> 字符串列表中查找两列的交集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49796271/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:27:22  来源:igfitidea点击:

Find intersection of two columns in Python Pandas -> list of strings

pythonlistpandasunique

提问by Mia

I would like to count how many instances of column A and B intersect. The rows in Column A and B are lists of strings. For example, column A may contain [car, passenger, truck] and column B may contain [car, house, flower, truck]. Since in this case, 2 strings overlap, column C should display -> 2

我想计算 A 列和 B 列相交的实例数。A 列和 B 列中的行是字符串列表。例如,A 列可能包含 [汽车、乘客、卡车],B 列可能包含 [汽车、房屋、花卉、卡车]。由于在这种情况下,2 个字符串重叠,因此 C 列应显示 -> 2

I have tried (none of these work):

我试过(这些都不起作用):

df['unique'] = np.unique(frame[['colA', 'colB']])

or

或者

def unique(colA, colB):
    unique1 = list(set(colA) & set(colB))
    return unique1

df['unique'] = df.apply(unique, args=(df['colA'], frame['colB']))

TypeError: ('unique() takes 2 positional arguments but 3 were given', 'occurred at index article')

TypeError: ('unique() 需要 2 个位置参数,但给出了 3 个','发生在索引文章')

采纳答案by jezrael

I believe need lengthwith set.intersectionin list comprehension:

我相信需要lengthset.intersection在列表理解:

df['C'] = [len(set(a).intersection(b)) for a, b in zip(df.A, df.B)]

Or:

或者:

df['C'] = [len(set(a) & set(b)) for a, b in zip(df.A, df.B)]

Sample:

样品

df = pd.DataFrame(data={'A':[['car', 'passenger', 'truck'], ['car', 'truck']],
                        'B':[['car', 'house', 'flower', 'truck'], ['car', 'house']]})
print (df)
                         A                            B
0  [car, passenger, truck]  [car, house, flower, truck]
1             [car, truck]                 [car, house]

df['C'] = [len(set(a).intersection(b)) for a, b in zip(df.A, df.B)]
print (df)
                         A                            B  C
0  [car, passenger, truck]  [car, house, flower, truck]  2
1             [car, truck]                 [car, house]  1