Python 为日期时间和布尔值指定正确的数据类型到 pandas.read_csv

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20095983/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:30:49  来源:igfitidea点击:

Specify correct dtypes to pandas.read_csv for datetimes and booleans

pythonpandascsvtypestype-conversion

提问by elgehelge

I am loading a csv file into a Pandas DataFrame. For each column, how do I specify what type of data it contains using the dtypeargument?

我正在将 csv 文件加载到 Pandas DataFrame 中。对于每一列,我如何使用dtype参数指定它包含的数据类型?

  • I can do it with numericdata (code at bottom)...
  • But how do I specify timedata...
  • and categoricaldata such as factors or booleans? I have tried np.bool_and pd.tslib.Timestampwithout luck.
  • 我可以用数字数据(底部的代码)来做...
  • 但是我如何指定时间数据...
  • 分类数据,例如因子或布尔值?我试过了np.bool_pd.tslib.Timestamp但没有运气。

Code:

代码:

import pandas as pd
import numpy as np
df = pd.read_csv(<file-name>, dtype={'A': np.int64, 'B': np.float64})

采纳答案by Paul

There are a lot of options for read_csv which will handle all the cases you mentioned. You might want to try dtype={'A': datetime.datetime}, but often you won't need dtypes as pandas can infer the types.

read_csv 有很多选项可以处理您提到的所有情况。您可能想尝试 dtype={'A': datetime.datetime},但通常您不需要 dtypes,因为 Pandas 可以推断类型。

For dates, then you need to specify the parse_date options:

对于日期,则需要指定 parse_date 选项

parse_dates : boolean, list of ints or names, list of lists, or dict
keep_date_col : boolean, default False
date_parser : function

In general for converting boolean values you will need to specify:

一般来说,要转换布尔值,您需要指定:

true_values  : list  Values to consider as True
false_values : list  Values to consider as False

Which will transform any value in the list to the boolean true/false. For more general conversions you will most likely need

这会将列表中的任何值转换为布尔值 true/false。对于更一般的转换,您很可能需要

converters : dict. optional Dict of functions for converting values in certain columns. Keys can either be integers or column labels

转换器:字典。用于转换某些列中的值的可选函数字典。键可以是整数或列标签

Though dense, check here for the full list: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html

虽然密集,请在此处查看完整列表:http: //pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html