pandas DataFrame 的几个时间序列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13728208/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Several time series to DataFrame
提问by Jonas
I have problem merging several time series to a common DataFrame. The example code I'm using:
我在将多个时间序列合并到一个公共 DataFrame 时遇到问题。我正在使用的示例代码:
import pandas
import datetime
import numpy as np
start = datetime.datetime(2001, 1, 1)
end = datetime.datetime(2001, 1, 10)
dates = pandas.date_range(start, end)
serie_1 = pandas.Series(np.random.randn(10), index = dates)
start = datetime.datetime(2001, 1, 2)
end = datetime.datetime(2001, 1, 11)
dates = pandas.date_range(start, end)
serie_2 = pandas.Series(np.random.randn(10), index = dates)
start = datetime.datetime(2001, 1, 3)
end = datetime.datetime(2001, 1, 12)
dates = pandas.date_range(start, end)
serie_3 = pandas.Series(np.random.randn(10), index = dates)
print 'serie_1'
print serie_1
print 'serie_2'
print serie_2
print 'serie_3'
print serie_3
serie_4 = pandas.concat([serie_1,serie_2], join='outer', axis = 1)
print 'serie_4'
print serie_4
serie_5 = pandas.concat([serie_4, serie_3], join='outer', axis = 1)
print 'serie_5'
print serie_5
This gives me the error for serie_5 (the second concat):
这给了我 serie_5(第二个 concat)的错误:
Traceback (most recent call last):
File "C:\Users\User\Workspaces\Python\Source\TestingPandas.py", line 29, in <module>
serie_5 = pandas.concat([serie_4, serie_3], join='outer', axis = 1)
File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 878, in concat
verify_integrity=verify_integrity)
File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 948, in __init__
self.new_axes = self._get_new_axes()
File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 1101, in _get_new_axes
new_axes[i] = self._get_comb_axis(i)
File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 1125, in _get_comb_axis
all_indexes = [x._data.axes[i] for x in self.objs]
AttributeError: 'TimeSeries' object has no attribute '_data'
I would like the result to look something like this (with random values in column 2):
我希望结果看起来像这样(第 2 列中有随机值):
0 1 2
2001-01-01 -1.224602 NaN NaN
2001-01-02 -1.747710 -2.618369 NaN
2001-01-03 -0.608578 -0.030674 -1.335857
2001-01-04 1.503808 -0.050492 1.086147
2001-01-05 0.593152 0.834805 -1.310452
2001-01-06 -0.156984 0.208565 -0.972561
2001-01-07 0.650264 -0.340086 1.562101
2001-01-08 -0.063765 -0.250005 -0.508458
2001-01-09 -1.092656 -1.589261 -0.481741
2001-01-10 0.640306 0.333527 -0.111668
2001-01-11 NaN -1.159637 0.110722
2001-01-12 NaN NaN -0.409387
What is wrong? As I said, probablybasic but I can not figure it out and I'm a beginner...
怎么了?正如我所说,可能是基本的,但我无法弄清楚,而且我是初学者......
回答by unutbu
Concatenating a list of Seriesreturns a DataFrame. Thus, serie_4is a DataFrame. serie_3is a Series. Concatenating a DataFramewith a Seriesraises the exception.
连接一个Series返回列表a DataFrame。因此,serie_4是DataFrame。serie_3是一个Series。将 aDataFrame与 a连接Series会引发异常。
You could use
你可以用
import pandas as pd
serie_5 = pd.concat([serie_1, serie_2, serie_3], join='outer', axis=1)
instead.
反而。
For example,
例如,
import functools
import numpy as np
import pandas as pd
s1 = pd.Series([0,1], index=list('AB'))
s2 = pd.Series([2,3], index=list('AC'))
result = pd.concat([s1, s2], join='outer', axis=1, sort=False)
print(result)
yields
产量
0 1
A 0.0 2.0
B 1.0 NaN
C NaN 3.0
Note that you'll get a ValueError if you try to concatenate a series with a non-unique index. For example,
请注意,如果您尝试连接具有非唯一索引的系列,您将收到 ValueError。例如,
s3 = pd.Series([0,1], index=list('AB'), name='s3')
s4 = pd.Series([2,3], index=list('AA'), name='s4') # <-- non-unique index
result = pd.concat([s3, s4], join='outer', axis=1, sort=False)
raises
加注
ValueError: cannot reindex from a duplicate axis
To work around this, reset the index and merge DataFramesinstead:
要解决此问题,请重置索引并合并 DataFrame:
import functools
s3 = pd.Series([0,1], index=list('AB'), name='s3')
s4 = pd.Series([2,3], index=list('AA'), name='s4') # <-- non-unique index
result = functools.reduce(
lambda left,right: pd.merge(left,right,on='index',how='outer'),
[s.reset_index() for s in [s3,s4]])
print(result)
yields
产量
index s3 s4
0 A 0 2.0
1 A 0 3.0
2 B 1 NaN

