pandas 在熊猫中设置联合

Question

提问by cppgnlearner

I have two columns which I stored sets in my dataframe.

我有两列存储在我的数据框中。

I want to perform set union on the two columns using fast vectorized operation

我想使用快速矢量化操作对两列执行集合并集

df['union'] = df.set1 | df.set2

but the error TypeError: unsupported operand type(s) for |: 'set' and 'bool'is preventing me from doing so as I have type np.nanin both columns.

但错误TypeError: unsupported operand type(s) for |: 'set' and 'bool'阻止我这样做，因为我np.nan在两列中都输入了内容。

Is there a good solution to overcome this?

有没有好的解决方案来克服这个问题？

Answer 1

采纳答案by ayhan

For these operations pure Python may be more efficient.

对于这些操作，纯 Python 可能更高效。

%timeit pd.Series([set1.union(set2) for set1, set2 in zip(df['A'], df['B'])])
10 loops, best of 3: 43.3 ms per loop

%timeit df.apply(lambda x: x.A.union(x.B), axis=1)
1 loop, best of 3: 2.6 s per loop

If we could use +, it would probably take half the time (inheritance may not worth it):

如果我们可以使用+，它可能需要一半的时间（继承可能不值得）：

%timeit df['A'] - df['B']
10 loops, best of 3: 22.1 ms per loop

%timeit pd.Series([set1.difference(set2) for set1, set2 in zip(df['A'], df['B'])])
10 loops, best of 3: 35.7 ms per loop

DataFrame for timings:

用于计时的数据帧：

import pandas as pd
import numpy as np
l1 = [set(np.random.choice(list('abcdefg'), np.random.randint(1, 5))) for _ in range(100000)]
l2 = [set(np.random.choice(list('abcdefg'), np.random.randint(1, 5))) for _ in range(100000)]

df = pd.DataFrame({'A': l1, 'B': l2})

Answer 2

回答by piRSquared

This is the best I could come up with:

这是我能想到的最好的：

# method 1
df.apply(lambda x: x.set1.union(x.set2), axis=1)

# method 2
df.applymap(list).sum(1).apply(set)

Wow!

哇！

I expected the method 2 to be quicker. Not so!

我希望方法 2 更快。不是这样！

Example

例子

df = pd.DataFrame([[{1, 2, 3}, {3, 4, 5}] for _ in range(3)],
                  columns=list('AB'))
df

df.apply(lambda x: x.set1.union(x.set2), axis=1)

0    {1, 2, 3, 4, 5}
1    {1, 2, 3, 4, 5}
2    {1, 2, 3, 4, 5}

pandas 在熊猫中设置联合

提问by cppgnlearner

采纳答案by ayhan

回答by piRSquared

Wow!

哇！

Example

例子

相关推荐

最近更新

标签

pandas 在熊猫中设置联合

提问by cppgnlearner

采纳答案by ayhan

回答by piRSquared

Wow!

哇！

Example

例子

相关推荐

Pandas 中的 Excel VLOOKUP 等效项

pandas 将列表设置为熊猫数据框列中的值

将 for 循环应用于 Pandas 中的多个 DataFrame

Qcut Pandas：ValueError：Bin 边缘必须是唯一的

相关推荐

最近更新

标签