Pandas 和 Sets - ValueError:值的长度与索引的长度不匹配

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/55231183/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:21:21  来源:igfitidea点击:

Pandas and Sets - ValueError: Length of values does not match length of index

pythonpandasset

提问by johnaco

I am trying to create a new column in my dataframe that contains the intersection of two sets (each contained in two separate columns). The columns themselves hold sets.

我正在尝试在我的数据框中创建一个新列,该列包含两个集合的交集(每个都包含在两个单独的列中)。列本身包含集合。

dfc['INTERSECTION'] =  set(dfc.TABS1).intersection(set(dfc.TABS2))

I get a Value error. I was able to do

我收到一个值错误。我能够做到

dfc['LEFT'] = set(dfc.TABS1) - set(dfc.TABS2)

no problem. TABS1 and TABS2 have values.

没问题。TABS1 和 TABS2 有值。

Any thoughts? Thanks.

有什么想法吗?谢谢。

I am adding example data below.

我在下面添加示例数据。

GROUP TABS1               TABS2 
A     {'T1','T2','T3'}   {'T2','T3','T4'} 
B     {'T5', 'T6'}       {'T6'}

Chris gave example, but using very different data set. I am looking for the intersection of TAB1 and TAB2 in a third column 'INTERSECTION. As mentioned above, I have no problems with

Chris 举了例子,但使用了非常不同的数据集。我正在第三列“INTERSECTION”中寻找 TAB1 和 TAB2 的交集。如上所述,我没有问题

dfc['LEFT'] = set(dfc.TAB1) - set(dfc.TAB2)

This looks like it should be so straight forward...

这看起来应该如此简单......

回答by Yo_Chris

setremoves duplicates so you end up with a dict with a length less than the length of your dataframe. You need make sure the length of the array you are assign to a new column is equal to the length of the dataframe. You can replace the non-intersections with NaNif you want using list comprehension:

set删除重复项,因此您最终会得到一个长度小于数据帧长度的字典。您需要确保分配给新列的数组长度等于数据帧的长度。NaN如果您想使用列表理解,您可以将非交集替换为:

# sample data
df = pd.DataFrame([[1,2,3], [1,2,3], [2,3,4], [3,4,5]], columns=list('abc'))
# list comprehension
df['intersection'] = [a if a in set(df['b']) else np.nan for a in df['a']]

   a  b  c  intersection
0  1  2  3           NaN
1  1  2  3           NaN
2  2  3  4           2.0
3  3  4  5           3.0