Pandas 将 csv dateint 列读取到 datetime
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/26327626/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas read csv dateint columns to datetime
提问by tdiddy
I'm new to both StackOverflow and pandas. I am trying to read in a large CSV file with stock market bin data in the following format:
我是 StackOverflow 和 Pandas 的新手。我正在尝试以以下格式读取包含股票市场 bin 数据的大型 CSV 文件:
date,time,open,high,low,close,volume,splits,earnings,dividends,sym
20130625,715,49.2634,49.2634,49.2634,49.2634,156.293,1,0,0,JPM
20130625,730,49.273,49.273,49.273,49.273,208.39,1,0,0,JPM
20130625,740,49.1866,49.1866,49.1866,49.1866,224.019,1,0,0,JPM
20130625,745,49.321,49.321,49.321,49.321,208.39,1,0,0,JPM
20130625,750,49.3306,49.369,49.3306,49.369,4583.54,1,0,0,JPM
20130625,755,49.369,49.369,49.369,49.369,416.78,1,0,0,JPM
20130625,800,49.369,49.369,49.3594,49.3594,1715.05,1,0,0,JPM
20130625,805,49.369,49.369,49.3306,49.3306,1333.7,1,0,0,JPM
20130625,810,49.3306,49.3786,49.3306,49.3786,1567.09,1,0,0,JPM
I have the following code to read it into a DataFrame in Pandas
我有以下代码将其读入 Pandas 中的 DataFrame
import numpy as np
import scipy as sp
import pandas as pd
import datetime as dt
fname  = 'bindat.csv'
df     = pd.read_csv(fname, header=0, sep=',')
The problem is that the date and time columns are read in as int64. I would like to merge these two to a single timestamp such as: 2013-06-25 07:15:00.
问题是日期和时间列被读入为 int64。我想将这两个合并为一个时间戳,例如:2013-06-25 07:15:00。
I am struggling to even get the time read in properly using:
我什至都在努力使用以下方法正确读取时间:
df['date'] = pd.to_datetime(df['date'].astype(str))
df['time'] = pd.to_datetime(df['time'].astype(str))
The first command works to convert, but the time seems weird.
第一个命令可以转换,但时间似乎很奇怪。
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 9999 entries, 0 to 9998
Data columns (total 11 columns):
date         9999 non-null datetime64[ns]
time         9999 non-null object
open         9999 non-null float64
high         9999 non-null float64
low          9999 non-null float64
close        9999 non-null float64
volume       9999 non-null float64
splits       9999 non-null float64
earnings     9999 non-null int64
dividends    9999 non-null float64
sym          9999 non-null object
dtypes: datetime64[ns](1), float64(7), int64(1), object(2)None
And then I'll want to merge into a single DatetimeIndex.
然后我想合并为一个 DatetimeIndex。
Any suggestions are greatly appreciated.
任何建议都非常感谢。
Cheers!
干杯!
回答by DSM
There are quite a few ways to do this.  One way to do it during read_csvwould be to use the parse_datesand date_parserarguments, telling parse_datesto combine the date and time columns and defining an inline function to parse the dates:
有很多方法可以做到这一点。一种方法read_csv是使用parse_dates和date_parser参数,告诉parse_dates组合日期和时间列并定义内联函数来解析日期:
>>> df = pd.read_csv("bindat.csv", parse_dates=[["date", "time"]],
date_parser=lambda x: pd.to_datetime(x, format="%Y%m%d %H%M"), 
index_col="date_time")
>>> df
                        open     high      low    close    volume  splits  earnings  dividends  sym
date_time                                                                                          
2013-06-25 07:15:00  49.2634  49.2634  49.2634  49.2634   156.293       1         0          0  JPM
2013-06-25 07:30:00  49.2730  49.2730  49.2730  49.2730   208.390       1         0          0  JPM
2013-06-25 07:40:00  49.1866  49.1866  49.1866  49.1866   224.019       1         0          0  JPM
2013-06-25 07:45:00  49.3210  49.3210  49.3210  49.3210   208.390       1         0          0  JPM
2013-06-25 07:50:00  49.3306  49.3690  49.3306  49.3690  4583.540       1         0          0  JPM
2013-06-25 07:55:00  49.3690  49.3690  49.3690  49.3690   416.780       1         0          0  JPM
2013-06-25 08:00:00  49.3690  49.3690  49.3594  49.3594  1715.050       1         0          0  JPM
2013-06-25 08:05:00  49.3690  49.3690  49.3306  49.3306  1333.700       1         0          0  JPM
2013-06-25 08:10:00  49.3306  49.3786  49.3306  49.3786  1567.090       1         0          0  JPM
2013-06-25 16:10:00  49.3306  49.3786  49.3306  49.3786  1567.090       1         0          0  JPM
where I added an extra row at the end to make sure that hours were behaving.
我在最后添加了一个额外的行,以确保小时的行为。

