pandas 在python中为pandas数据帧获取时间索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15120763/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Getting a time index in python for pandas dataframe
提问by Taylor
I'm having a bit of trouble getting the right time index for my pandas dataframe.
我在为我的 Pandas 数据框获取正确的时间索引时遇到了一些麻烦。
import pandas as pd
from datetime import strptime
import numpy as np
stockdata = pd.read_csv("/home/stff/symbol_2012-02.csv", parse_dates =[[0,1,2]])
stockdata.columns = ['date_time','ticker','exch','salcond','vol','price','stopstockind','corrind','seqnum','source','trf','symroot','symsuffix']
I think the problem is that the time stuff comes in the first three columns: year/month/date, hour/minute/second, millisecond. Also, the hour/minute/second column drops the first zero if its before noon.
我认为问题是时间的东西出现在前三列中:年/月/日、时/分/秒、毫秒。此外,如果在中午之前,小时/分钟/秒列会删除第一个零。
print(stockdata['date_time'][0])
20120201 41206 300
print(stockdata['date_time'][50000])
20120201 151117 770
Ideally, I would like to define my own function that could be called by the converters argument in the read_csv function.
理想情况下,我想定义自己的函数,该函数可以由 read_csv 函数中的转换器参数调用。
采纳答案by abudis
Suppose you have a csvfile that looks like this:
假设您有一个如下所示的csv文件:
date,time,milliseconds,value
20120201,41206,300,1
20120201,151117,770,2
Then using parse_dates, index_colsand date_parserparameters of read_csvmethod, one could construct a pandasDataFramewith time index like this:
然后使用parse_dates,index_cols和方法的date_parser参数read_csv,可以构造一个pandasDataFrame像这样的时间索引:
import datetime as dt
import pandas as pd
parse = lambda x: dt.datetime.strptime(x, '%Y%m%d %H%M%S %f')
df = pd.read_csv('test.csv', parse_dates=[['date', 'time', 'milliseconds']],
index_col=0, date_parser=parse)
This yields:
这产生:
value
date_time_milliseconds
2012-02-01 04:12:06.300000 1
2012-02-01 15:11:17.770000 2
And df.index:
并且df.index:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-02-01 04:12:06.300000, 2012-02-01 15:11:17.770000]
Length: 2, Freq: None, Timezone: None
This answer is based on a similar solution proposed here.
此答案基于此处提出的类似解决方案。

