Python 为日期时间和布尔值指定正确的数据类型到 pandas.read_csv
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20095983/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Specify correct dtypes to pandas.read_csv for datetimes and booleans
提问by elgehelge
I am loading a csv file into a Pandas DataFrame. For each column, how do I specify what type of data it contains using the dtypeargument?
我正在将 csv 文件加载到 Pandas DataFrame 中。对于每一列,我如何使用dtype参数指定它包含的数据类型?
- I can do it with numericdata (code at bottom)...
- But how do I specify timedata...
- and categoricaldata such as factors or booleans? I have tried
np.bool_andpd.tslib.Timestampwithout luck.
- 我可以用数字数据(底部的代码)来做...
- 但是我如何指定时间数据...
- 和分类数据,例如因子或布尔值?我试过了
np.bool_,pd.tslib.Timestamp但没有运气。
Code:
代码:
import pandas as pd
import numpy as np
df = pd.read_csv(<file-name>, dtype={'A': np.int64, 'B': np.float64})
采纳答案by Paul
There are a lot of options for read_csv which will handle all the cases you mentioned. You might want to try dtype={'A': datetime.datetime}, but often you won't need dtypes as pandas can infer the types.
read_csv 有很多选项可以处理您提到的所有情况。您可能想尝试 dtype={'A': datetime.datetime},但通常您不需要 dtypes,因为 Pandas 可以推断类型。
For dates, then you need to specify the parse_date options:
对于日期,则需要指定 parse_date 选项:
parse_dates : boolean, list of ints or names, list of lists, or dict
keep_date_col : boolean, default False
date_parser : function
In general for converting boolean values you will need to specify:
一般来说,要转换布尔值,您需要指定:
true_values : list Values to consider as True
false_values : list Values to consider as False
Which will transform any value in the list to the boolean true/false. For more general conversions you will most likely need
这会将列表中的任何值转换为布尔值 true/false。对于更一般的转换,您很可能需要
converters : dict. optional Dict of functions for converting values in certain columns. Keys can either be integers or column labels
转换器:字典。用于转换某些列中的值的可选函数字典。键可以是整数或列标签
Though dense, check here for the full list: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html
虽然密集,请在此处查看完整列表:http: //pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html

