将多个数据框与具有重叠列名的 Pandas 连接起来？

Question

提问by Kyle Brandt

I have multiple (more than 2) dataframes I would like to merge. They all share the same value column:

我有多个（超过 2 个）数据框要合并。它们都共享相同的值列：

In [431]: [x.head() for x in data]
Out[431]: 
[                     AvgStatisticData
DateTime                             
2012-10-14 14:00:00         39.335996
2012-10-14 15:00:00         40.210110
2012-10-14 16:00:00         48.282816
2012-10-14 17:00:00         40.593039
2012-10-14 18:00:00         40.952014,
                      AvgStatisticData
DateTime                             
2012-10-14 14:00:00         47.854712
2012-10-14 15:00:00         55.041512
2012-10-14 16:00:00         55.488026
2012-10-14 17:00:00         51.688483
2012-10-14 18:00:00         57.916672,
                      AvgStatisticData
DateTime                             
2012-10-14 14:00:00         54.171233
2012-10-14 15:00:00         48.718387
2012-10-14 16:00:00         59.978616
2012-10-14 17:00:00         50.984514
2012-10-14 18:00:00         54.924745,
                      AvgStatisticData
DateTime                             
2012-10-14 14:00:00         65.813114
2012-10-14 15:00:00         71.397868
2012-10-14 16:00:00         76.213973
2012-10-14 17:00:00         72.729002
2012-10-14 18:00:00         73.196415,
....etc

I read that join can handle multiple dataframes, however I get:

我读到 join 可以处理多个数据帧，但是我得到：

In [432]: data[0].join(data[1:])
...
Exception: Indexes have overlapping values: ['AvgStatisticData']

I have tried passing rsuffix=["%i" % (i) for i in range(len(data))]to join and still get the same error. I can workaround this by building my datalist in a way where the column names don't overlap, but maybe there is a better way?

我试过通过rsuffix=["%i" % (i) for i in range(len(data))]加入但仍然得到同样的错误。我可以通过data以列名不重叠的方式构建我的列表来解决这个问题，但也许有更好的方法？

Answer 1

回答by Wouter Overmeire

In [65]: pd.concat(data, axis=1)
Out[65]:
                     AvgStatisticData  AvgStatisticData  AvgStatisticData  AvgStatisticData
2012-10-14 14:00:00         39.335996         47.854712         54.171233         65.813114
2012-10-14 15:00:00         40.210110         55.041512         48.718387         71.397868
2012-10-14 16:00:00         48.282816         55.488026         59.978616         76.213973
2012-10-14 17:00:00         40.593039         51.688483         50.984514         72.729002
2012-10-14 18:00:00         40.952014         57.916672         54.924745         73.196415

Answer 2

回答by Richard Herron

I would try pandas.mergeusing the suffixes=option.

我会尝试pandas.merge使用该suffixes=选项。

import pandas as pd
import datetime as dt

df_1 = pd.DataFrame({'x' : [dt.datetime(2012,10,21) + dt.timedelta(n) for n in range(10)], 'y' : range(10)})
df_2 = pd.DataFrame({'x' : [dt.datetime(2012,10,21) + dt.timedelta(n) for n in range(10)], 'y' : range(10)})
df = pd.merge(df_1, df_2, on='x', suffixes=['_1', '_2'])

I am interested to see if the experts have a more algorithmic approach to merge a list of data frames.

我有兴趣看看专家是否有更算法的方法来合并数据框列表。

将多个数据框与具有重叠列名的 Pandas 连接起来？

提问by Kyle Brandt

回答by Wouter Overmeire

回答by Richard Herron

相关推荐

最近更新

标签

将多个数据框与具有重叠列名的 Pandas 连接起来？

提问by Kyle Brandt

回答by Wouter Overmeire

回答by Richard Herron

相关推荐

如何通过 Pandas 中的多级索引进行“分组”

如何在 Pandas 中按子级索引过滤

使用 python pandas 以年、日、小时、分钟、秒格式解析带有日期的 CSV

如何在 hdf5 中有效地保存 python pandas 数据帧并在 R 中将其作为数据帧打开？

相关推荐

最近更新

标签