pandas 使用pandas将csv文件中的数据读入时间序列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15551102/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading data from csv file into time series with pandas
提问by Brian
My goal is to read EURUSD data(daily) into a time series object where I can easily slice-and-dice, aggregate, and resample the information based on irregular-ish time frames. This is most likely a simple answer. I'm working out of Python for Data Analysis but can't seem to bridge the gap.
我的目标是将 EURUSD数据(每天)读入一个时间序列对象中,我可以在其中轻松地根据不规则的时间范围对信息进行切片、聚合和重新采样。这很可能是一个简单的答案。我正在使用 Python 进行数据分析,但似乎无法弥补差距。
After downloading and unzipping the data, I run the following code:
下载并解压数据后,我运行以下代码:
>>> import pandas as pd
>>> df = pd.read_csv('EURUSD_day.csv', parse_dates = {'Timestamp' : ['<DATE>', '<TIME>']}, index_col = 'Timestamp')
So far so good. I now have a nice data frame with Timestamps as the index.
到现在为止还挺好。我现在有一个很好的数据框,以时间戳为索引。
However, the book implies (p. 295) that I should be able to subset the data, as follows, to look at all the data from the year 2001.
然而,这本书暗示 (p. 295) 我应该能够对数据进行子集化,如下所示,以查看 2001 年的所有数据。
>>> df['2001']
But, that doesn't work.
但是,这行不通。
Reading this question and answertells me that I could import Timestamp:
阅读这个问题和答案告诉我我可以导入时间戳:
>>> from pandas.lib import Timestamp
>>> s = df['<CLOSE>']
Which seems to work for a particular day:
这似乎适用于特定的一天:
>>> s[Timestamp('2001-01-04)]
0.9506999999
Yet, the following code yields a single value for my desired range of all data from year 2001.
然而,以下代码为我想要的 2001 年所有数据范围生成了一个值。
>>> s[Timestamp('2001')]
0.8959
I know I am missing something simple, something basic. Can anyone help?
我知道我错过了一些简单的东西,一些基本的东西。任何人都可以帮忙吗?
Thank you, Brian
谢谢你,布赖恩
回答by bdiamante
The example on pg. 295 is being performed on Series object which is why indexing with the year works. With a DataFrame you would want df.ix['2001']to achieve the same results.
上的例子。295 正在 Series 对象上执行,这就是使用年份进行索引的原因。使用 DataFrame,您会希望df.ix['2001']获得相同的结果。
回答by herrfz
If you want to get all of the columns, then df.ix['2001'].
如果要获取所有列,则df.ix['2001'].
If you're interested only in "CLOSE", since you already did s = df['<CLOSE>'], you can get the 2001 values by s['2001']
如果您只对“CLOSE”感兴趣,因为您已经这样做了s = df['<CLOSE>'],您可以通过以下方式获得 2001 年的值s['2001']

