Pandas 将多个数据帧与时间戳索引对齐
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26366021/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas aligning multiple dataframes with TimeStamp index
提问by Zhubarb
This has been the bane of my life for the past couple of days. I have numerous Pandas Dataframes that contain time series data with irregular frequencies. I try to align these into a single dataframe.
在过去的几天里,这一直是我生活的祸根。我有许多包含不规则频率的时间序列数据的 Pandas 数据帧。我尝试将这些对齐到单个数据帧中。
Below is some code, with representative dataframes, df1, df2, and df3( I actually have n=5, and would appreciate a solution that would work for all n>2):
下面是一些代码,带有代表性的数据帧、df1、df2和df3(我实际上有 n=5,并且希望有一个适用于所有人的解决方案n>2):
# df1, df2, df3 are given at the bottom
import pandas as pd
import datetime
# I can align df1 to df2 easily
df1aligned, df2aligned = df1.align(df2)
# And then concatenate into a single dataframe
combined_1_n_2 = pd.concat([df1aligned, df2aligned], axis =1 )
# Since I don't know any better, I then try to align df3 to combined_1_n_2 manually:
combined_1_n_2.align(df3)
error: Reindexing only valid with uniquely valued Index objects
I have an idea why I get this error, so I get rid of the duplicate indices in combined_1_n_2and try again:
我知道为什么我会收到这个错误,所以我去掉了重复的索引,combined_1_n_2然后再试一次:
combined_1_n_2 = combined_1_n_2.groupby(combined_1_n_2.index).first()
combined_1_n_2.align(df3) # But stll get the same error
error: Reindexing only valid with uniquely valued Index objects
Why am I getting this error? Even if this worked, it is completely manual and ugly. How can I align >2 time series and combine them in a single dataframe?
为什么我收到这个错误?即使这有效,它也是完全手动且丑陋的。如何对齐 > 2 个时间序列并将它们组合在一个数据框中?
Data:
数据:
df1 = pd.DataFrame( {'price' : [62.1250,62.2500,62.2375,61.9250,61.9125 ]},
index = [pd.DatetimeIndex([datetime.datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f')])[0]
for s in ['2008-06-01 06:03:59.614000', '2008-06-01 06:03:59.692000',
'2008-06-01 06:15:42.004000', '2008-06-01 06:15:42.083000','2008-06-01 06:17:01.654000' ] ])
df2 = pd.DataFrame({'price': [241.0625, 241.5000, 241.3750, 241.2500, 241.3750 ]},
index = [pd.DatetimeIndex([datetime.datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f')])[0]
for s in ['2008-06-01 06:13:34.524000', '2008-06-01 06:13:34.602000',
'2008-06-01 06:15:05.399000', '2008-06-01 06:15:05.399000','2008-06-01 06:15:42.082000' ] ])
df3 = pd.DataFrame({'price': [67.656, 67.875, 67.8125, 67.75, 67.6875 ]},
index = [pd.DatetimeIndex([datetime.datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f')])[0]
for s in ['2008-06-01 06:03:52.281000', '2008-06-01 06:03:52.359000',
'2008-06-01 06:13:34.848000', '2008-06-01 06:13:34.926000','2008-06-01 06:15:05.321000' ] ])
采纳答案by chrisb
Your specific error is due the column names of combined_1_n_2having duplicates (both columns will be named 'price'). You could rename the columns and the second align would work.
您的具体错误是由于列名combined_1_n_2重复(两列都将命名为“价格”)。您可以重命名列,第二个对齐将起作用。
One alternative way would be to chain the joinoperator, which merges frames on the index, as below.
另一种方法是链接join操作符,它合并索引上的帧,如下所示。
In [23]: df1.join(df2, how='outer', rsuffix='_1').join(df3, how='outer', rsuffix='_2')
Out[23]:
price price_1 price_2
2008-06-01 06:03:52.281000 NaN NaN 67.6560
2008-06-01 06:03:52.359000 NaN NaN 67.8750
2008-06-01 06:03:59.614000 62.1250 NaN NaN
2008-06-01 06:03:59.692000 62.2500 NaN NaN
2008-06-01 06:13:34.524000 NaN 241.0625 NaN
2008-06-01 06:13:34.602000 NaN 241.5000 NaN
2008-06-01 06:13:34.848000 NaN NaN 67.8125
2008-06-01 06:13:34.926000 NaN NaN 67.7500
2008-06-01 06:15:05.321000 NaN NaN 67.6875
2008-06-01 06:15:05.399000 NaN 241.3750 NaN
2008-06-01 06:15:05.399000 NaN 241.2500 NaN
2008-06-01 06:15:42.004000 62.2375 NaN NaN
2008-06-01 06:15:42.082000 NaN 241.3750 NaN
2008-06-01 06:15:42.083000 61.9250 NaN NaN
2008-06-01 06:17:01.654000 61.9125 NaN NaN

