pandas read_csv 将对象转换为浮动
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38553946/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas read_csv convert object to float
提问by user3529091
i'm trying to read a csv file. in one column (hpi) which should be float32 there are two records populated with a . to indicate missing values. pandas interprets the . as a character.
我正在尝试读取 csv 文件。在应该是 float32 的一列 (hpi) 中,有两条记录填充了 . 表示缺失值。pandas 解释了 . 作为一个角色。
how do force numeric on this column?
如何在此列上强制使用数字?
data = pd.read_csv('http://www.fhfa.gov/DataTools/Downloads/Documents/HPI/HPI_AT_state.csv',
header=0,
names = ["state", "year", "qtr", "hpi"])
#,converters={'hpi': float})
#print data.head()
#print(data.dtypes)
print(data[data.hpi == '.'])
采纳答案by ayhan
Use na.values
parameter in read.csv:
na.values
在read.csv 中使用参数:
df = pd.read_csv('http://www.fhfa.gov/DataTools/Downloads/Documents/HPI/HPI_AT_state.csv',
header=0,
names = ["state", "year", "qtr", "hpi"],
na_values='.')
df.dtypes
Out:
state object
year int64
qtr int64
hpi float64
dtype: object
回答by danielhadar
Apply to_numericover the desired column (with apply):
将to_numeric应用于所需的列(使用apply):
data.loc[data.hpi == '.', 'hpi'] = -1.0
data[['hpi']] = data[['hpi']].apply(pd.to_numeric)
For example:
例如:
In[69]: data = pd.read_csv('http://www.fhfa.gov/DataTools/Downloads/Documents/HPI/HPI_AT_state.csv',
header=0,
names = ["state", "year", "qtr", "hpi"])
In[70]: data[['hpi']].dtypes
Out[70]:
hpi object
dtype: object
In[74]: data.loc[data.hpi == '.'] = -1.0
In[75]: data[['hpi']] = data[['hpi']].apply(pd.to_numeric)
In[77]: data[['hpi']].dtypes
Out[77]:
hpi float64
dtype: object
EDIT:
编辑:
For some reason it changes all the columns to float64
. This is a small workaround that changes them back to int
.
出于某种原因,它将所有列更改为float64
. 这是一个小的解决方法,可以将它们改回int
.
Before:
前:
In[89]: data.dtypes
Out[89]:
state object
year float64
qtr float64
hpi float64
After:
后:
In[90]: data[['year','qtr']] = data[['year','qtr']].astype(int)
In[91]: data.dtypes
Out[91]:
state object
year int64
qtr int64
hpi float64
dtype: object
If anyone could shed light over way it happens that'd be great.
如果有人能阐明它的方式,那就太好了。
回答by mgilbert
You could just cast this after you read it in. e.g.
你可以在你读进去之后投射这个。例如
data.loc[data.hpi == '.', 'hpi'] = pd.np.nan
data.hpi = data.hpi.astype(pd.np.float64)
Alternatively you can use the na_values
parameter for read_csv
或者,您可以使用na_values
参数read_csv