Python 熊猫 read_csv 中的日期时间数据类型

Question

提问by user3221055

I'm reading in a csv file with multiple datetime columns. I'd need to set the data types upon reading in the file, but datetimes appear to be a problem. For instance:

我正在读取具有多个日期时间列的 csv 文件。我需要在读取文件时设置数据类型，但日期时间似乎是一个问题。例如：

headers = ['col1', 'col2', 'col3', 'col4']
dtypes = ['datetime', 'datetime', 'str', 'float']
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)

When run gives a error:

运行时报错：

TypeError: data type "datetime" not understood

类型错误：无法理解数据类型“日期时间”

Converting columns after the fact, via pandas.to_datetime() isn't an option I can't know which columns will be datetime objects. That information can change and comes from whatever informs my dtypes list.

事后转换列，通过 pandas.to_datetime() 不是一个选项，我不知道哪些列将是 datetime 对象。该信息可以更改并且来自通知我的 dtypes 列表的任何内容。

Alternatively, I've tried to load the csv file with numpy.genfromtxt, set the dtypes in that function, and then convert to a pandas.dataframe but it garbles the data. Any help is greatly appreciated!

或者，我尝试使用 numpy.genfromtxt 加载 csv 文件，在该函数中设置 dtypes，然后转换为 pandas.dataframe 但它会使数据出现乱码。任何帮助是极大的赞赏！

Answer 1

回答by Paul H

You might try passing actual types instead of strings.

您可以尝试传递实际类型而不是字符串。

import pandas as pd
from datetime import datetime
headers = ['col1', 'col2', 'col3', 'col4'] 
dtypes = [datetime, datetime, str, float] 
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)

But it's going to be really hard to diagnose this without any of your data to tinker with.

但是，如果没有您的任何数据可以修补，就很难诊断出这一点。

And really, you probably want pandas to parse the the dates into TimeStamps, so that might be:

实际上，您可能希望熊猫将日期解析为时间戳，因此可能是：

pd.read_csv(file, sep='\t', header=None, names=headers, parse_dates=True)

Answer 2

回答by Jose Buraschi

I tried using the dtypes=[datetime, ...] option, but

我尝试使用 dtypes=[datetime, ...] 选项，但是

import pandas as pd
from datetime import datetime
headers = ['col1', 'col2', 'col3', 'col4'] 
dtypes = [datetime, datetime, str, float] 
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)

I encountered the following error:

我遇到了以下错误：

TypeError: data type not understood

The only change I had to make is to replace datetime with datetime.datetime

我必须做的唯一改变是用 datetime.datetime 替换 datetime

import pandas as pd
from datetime import datetime
headers = ['col1', 'col2', 'col3', 'col4'] 
dtypes = [datetime.datetime, datetime.datetime, str, float] 
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)

Answer 3

回答by firelynx

Why it does not work

为什么它不起作用

There is no datetime dtype to be set for read_csv as csv files can only contain strings, integers and floats.

没有为 read_csv 设置 datetime dtype，因为 csv 文件只能包含字符串、整数和浮点数。

Setting a dtype to datetime will make pandas interpret the datetime as an object, meaning you will end up with a string.

将 dtype 设置为 datetime 将使 Pandas 将日期时间解释为一个对象，这意味着您最终会得到一个字符串。

Pandas way of solving this

熊猫解决这个问题的方法

The pandas.read_csv()function has a keyword argument called parse_dates

该pandas.read_csv()函数有一个关键字参数，称为parse_dates

Using this you can on the fly convert strings, floats or integers into datetimes using the default date_parser(dateutil.parser.parser)

使用它，您可以使用默认值date_parser( dateutil.parser.parser)即时将字符串、浮点数或整数转换为日期时间

headers = ['col1', 'col2', 'col3', 'col4']
dtypes = {'col1': 'str', 'col2': 'str', 'col3': 'str', 'col4': 'float'}
parse_dates = ['col1', 'col2']
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes, parse_dates=parse_dates)

This will cause pandas to read col1and col2as strings, which they most likely are ("2016-05-05" etc.) and after having read the string, the date_parser for each column will act upon that string and give back whatever that function returns.

这将导致熊猫读取col1并col2作为字符串，它们很可能是（“2016-05-05”等），并且在读取字符串后，每列的 date_parser 将对该字符串进行操作并返回该函数返回的任何内容.

Defining your own date parsing function:

定义自己的日期解析函数：

The pandas.read_csv()function alsohas a keyword argument called date_parser

该pandas.read_csv()函数还有一个关键字参数，称为date_parser

Setting this to a lambda function will make that particular function be used for the parsing of the dates.

将此设置为 lambda 函数将使该特定函数用于解析日期。

GOTCHA WARNING

陷阱警告

You have to give it the function, not the execution of the function, thus this is Correct

你必须给它函数，而不是函数的执行，因此这是正确的

date_parser = pd.datetools.to_datetime

This is incorrect:

这是不正确的：

date_parser = pd.datetools.to_datetime()

Pandas 0.22 Update

熊猫 0.22 更新

pd.datetools.to_datetimehas been relocated to date_parser = pd.to_datetime

pd.datetools.to_datetime已搬迁至 date_parser = pd.to_datetime

Thanks @stackoverYC

谢谢@stackoverYC

Answer 4

回答by mrjrdnthms

There is a parse_datesparameter for read_csvwhich allows you to define the names of the columns you want treated as dates or datetimes:

有一个parse_dates参数read_csv允许您定义要作为日期或日期时间处理的列的名称：

date_cols = ['col1', 'col2']
pd.read_csv(file, sep='\t', header=None, names=headers, parse_dates=date_cols)

Python 熊猫 read_csv 中的日期时间数据类型

提问by user3221055

回答by Paul H

回答by Jose Buraschi

回答by firelynx

Why it does not work

为什么它不起作用

Pandas way of solving this

熊猫解决这个问题的方法

Defining your own date parsing function:

定义自己的日期解析函数：

GOTCHA WARNING

陷阱警告

Pandas 0.22 Update

熊猫 0.22 更新

回答by mrjrdnthms

相关推荐

最近更新

标签

Python 熊猫 read_csv 中的日期时间数据类型

提问by user3221055

回答by Paul H

回答by Jose Buraschi

回答by firelynx

Why it does not work

为什么它不起作用

Pandas way of solving this

熊猫解决这个问题的方法

Defining your own date parsing function:

定义自己的日期解析函数：

GOTCHA WARNING

陷阱警告

Pandas 0.22 Update

熊猫 0.22 更新

回答by mrjrdnthms

相关推荐

Python Pandas 错误：'DataFrame' 对象没有属性 'loc'

Python Django NoReverseMatch

Python tkinter 中的标签宽度

Python AttributeError: 'Sheet' 对象没有属性 'write'

相关推荐

最近更新

标签