pandas 有效地合并熊猫中的多个数据帧

Question

提问by Cmdt.Ed

What is the most efficient way to merge multiple data frames (i.e., more than 2) in pandas? There are a few answers:

在 Pandas 中合并多个数据帧（即超过 2 个）的最有效方法是什么？有几个答案：

but these all involve multiple joins. If I have N data frames these would require N-1 joins.

但这些都涉及多个连接。如果我有 N 个数据框，这些将需要 N-1 个连接。

If I weren't using pandas, another solution would be to just put everything into a hash table based on the common index as the key and build the final version. This is basically like a hash join in SQL I believe. Is there something like that in pandas?

如果我不使用Pandas，另一种解决方案是将所有内容放入基于公共索引作为键的哈希表中并构建最终版本。这基本上就像我相信的 SQL 中的哈希连接。Pandas中有这样的东西吗？

If not, would it be more efficient to just create a new data frame with the common index and pass it the raw data from each data frame? It seems like that would at least prevent you from creating a new data frame in each of the N-1 joins.

如果没有，仅创建具有公共索引的新数据框并将每个数据框的原始数据传递给它会更有效吗？看起来这至少会阻止您在每个 N-1 连接中创建新数据框。

Thanks.

谢谢。

Answer 1

回答by MaxU

if you can join your data frames by index you can do it in one conveyor:

如果您可以按索引加入数据框，则可以在一个传送带上完成：

df1.join(df2).join(df3).join(df4)

example:

例子：

In [187]: df1
Out[187]:
   a  b
0  5  2
1  6  7
2  6  5
3  1  6
4  0  2

In [188]: df2
Out[188]:
   c  d
0  5  7
1  5  5
2  2  4
3  4  3
4  9  0

In [189]: df3
Out[189]:
   e  f
0  8  1
1  0  9
2  4  5
3  3  9
4  9  5

In [190]: df1.join(df2).join(df3)
Out[190]:
   a  b  c  d  e  f
0  5  2  5  7  8  1
1  6  7  5  5  0  9
2  6  5  2  4  4  5
3  1  6  4  3  3  9
4  0  2  9  0  9  5

It should be pretty fast and effective

它应该非常快速和有效

alternatively you can concatenate them:

或者，您可以连接它们：

In [191]: pd.concat([df1,df2,df3], axis=1)
Out[191]:
   a  b  c  d  e  f
0  5  2  5  7  8  1
1  6  7  5  5  0  9
2  6  5  2  4  4  5
3  1  6  4  3  3  9
4  0  2  9  0  9  5

Time comparison for 3 DF's with 100K rows each:

3 个 DF 的时间比较，每个 100K 行：

In [198]: %timeit pd.concat([df1,df2,df3], axis=1)
100 loops, best of 3: 5.67 ms per loop

In [199]: %timeit df1.join(df2).join(df3)
100 loops, best of 3: 3.93 ms per loop

so as you can see joinis bit faster

所以正如你所看到的那样join快一点

pandas 有效地合并熊猫中的多个数据帧

提问by Cmdt.Ed

回答by MaxU

相关推荐

最近更新

标签

pandas 有效地合并熊猫中的多个数据帧

提问by Cmdt.Ed

回答by MaxU

相关推荐

pandas pandas数据框中的值组合

pandas 使用局部加权回归（LOESS/LOWESS）预测新数据

使用 Pandas 为 Scikit-Learn 准备 CSV 文件数据？

Pandas 交叉表与 Pandas 数据透视表有何不同？

相关推荐

最近更新

标签