将多个数据框与具有重叠列名的 Pandas 连接起来?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13003769/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Joining Multiple Dataframes with Pandas with overlapping Column Names?
提问by Kyle Brandt
I have multiple (more than 2) dataframes I would like to merge. They all share the same value column:
我有多个(超过 2 个)数据框要合并。它们都共享相同的值列:
In [431]: [x.head() for x in data]
Out[431]:
[ AvgStatisticData
DateTime
2012-10-14 14:00:00 39.335996
2012-10-14 15:00:00 40.210110
2012-10-14 16:00:00 48.282816
2012-10-14 17:00:00 40.593039
2012-10-14 18:00:00 40.952014,
AvgStatisticData
DateTime
2012-10-14 14:00:00 47.854712
2012-10-14 15:00:00 55.041512
2012-10-14 16:00:00 55.488026
2012-10-14 17:00:00 51.688483
2012-10-14 18:00:00 57.916672,
AvgStatisticData
DateTime
2012-10-14 14:00:00 54.171233
2012-10-14 15:00:00 48.718387
2012-10-14 16:00:00 59.978616
2012-10-14 17:00:00 50.984514
2012-10-14 18:00:00 54.924745,
AvgStatisticData
DateTime
2012-10-14 14:00:00 65.813114
2012-10-14 15:00:00 71.397868
2012-10-14 16:00:00 76.213973
2012-10-14 17:00:00 72.729002
2012-10-14 18:00:00 73.196415,
....etc
I read that join can handle multiple dataframes, however I get:
我读到 join 可以处理多个数据帧,但是我得到:
In [432]: data[0].join(data[1:])
...
Exception: Indexes have overlapping values: ['AvgStatisticData']
I have tried passing rsuffix=["%i" % (i) for i in range(len(data))]to join and still get the same error. I can workaround this by building my datalist in a way where the column names don't overlap, but maybe there is a better way?
我试过通过rsuffix=["%i" % (i) for i in range(len(data))]加入但仍然得到同样的错误。我可以通过data以列名不重叠的方式构建我的列表来解决这个问题,但也许有更好的方法?
回答by Wouter Overmeire
In [65]: pd.concat(data, axis=1)
Out[65]:
AvgStatisticData AvgStatisticData AvgStatisticData AvgStatisticData
2012-10-14 14:00:00 39.335996 47.854712 54.171233 65.813114
2012-10-14 15:00:00 40.210110 55.041512 48.718387 71.397868
2012-10-14 16:00:00 48.282816 55.488026 59.978616 76.213973
2012-10-14 17:00:00 40.593039 51.688483 50.984514 72.729002
2012-10-14 18:00:00 40.952014 57.916672 54.924745 73.196415
回答by Richard Herron
I would try pandas.mergeusing the suffixes=option.
我会尝试pandas.merge使用该suffixes=选项。
import pandas as pd
import datetime as dt
df_1 = pd.DataFrame({'x' : [dt.datetime(2012,10,21) + dt.timedelta(n) for n in range(10)], 'y' : range(10)})
df_2 = pd.DataFrame({'x' : [dt.datetime(2012,10,21) + dt.timedelta(n) for n in range(10)], 'y' : range(10)})
df = pd.merge(df_1, df_2, on='x', suffixes=['_1', '_2'])
I am interested to see if the experts have a more algorithmic approach to merge a list of data frames.
我有兴趣看看专家是否有更算法的方法来合并数据框列表。

