pandas 使用熊猫读取“csv”文件时解析日期时间
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38849676/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
parse date-time while reading 'csv' file with pandas
提问by dss
I am trying to parse dates while I am? reading my data from cvs file. The command that I use is
我正在尝试解析日期?从 cvs 文件中读取我的数据。我使用的命令是
df = pd.read_csv('/Users/n....', names=names, parse_dates=['date'])? )
And it is working on my files generally.
But I have couple of data sets which has variety in date formats. I mean it has date format is like that (09/20/15 09:59? )
while it has another format in other lines is like that ( 2015-09-20 10:22:01.013? )
in the same file. And the command that I wrote above doesn't work on these file. It is working when I delete (parse_dates=['date'])?, but that time I can't use date column as datetime
format, it reads that column as integer . I would be appreciate anyone could answer that!
它通常正在处理我的文件。但是我有几个数据集,它们的日期格式多种多样。我的意思是它具有日期格式,(09/20/15 09:59? )
而它在其他行中的另一种格式( 2015-09-20 10:22:01.013? )
与同一文件中的格式相似。我上面写的命令对这些文件不起作用。当我删除 (parse_dates=['date'])? 时它正在工作,但是那个时候我不能使用 date 列作为datetime
格式,它将该列读取为 integer 。我将不胜感激任何人都可以回答!
回答by Anzel
Pandas read_csv
accepts date_parser
argument which you can define your own date parsing function. So for example in your case you have 2 different datetime formats you can simply do:
Pandasread_csv
接受date_parser
参数,您可以定义自己的日期解析函数。因此,例如在您的情况下,您可以简单地执行 2 种不同的日期时间格式:
import datetime
def date_parser(d):
try:
d = datetime.datetime.strptime("format 1")
except ValueError:
try:
d = datetime.datetime.strptime("format 2")
except:
# both formats not match, do something about it
return d
df = pd.read_csv('/Users/n....',
names=names,
parse_dates=['date1', 'date2']),
date_parser=date_parser)
You can then parse those dates in different formats in those columns.
然后,您可以在这些列中以不同格式解析这些日期。
回答by waniz
Like this:
像这样:
df = pd.read_csv(file, names=names)
df['date'] = pd.to_datetime(df['date'])