pandas read_csv 将对象转换为浮动

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38553946/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:39:27  来源:igfitidea点击:

pandas read_csv convert object to float

pandas

提问by user3529091

i'm trying to read a csv file. in one column (hpi) which should be float32 there are two records populated with a . to indicate missing values. pandas interprets the . as a character.

我正在尝试读取 csv 文件。在应该是 float32 的一列 (hpi) 中,有两条记录填充了 . 表示缺失值。pandas 解释了 . 作为一个角色。

how do force numeric on this column?

如何在此列上强制使用数字?

data = pd.read_csv('http://www.fhfa.gov/DataTools/Downloads/Documents/HPI/HPI_AT_state.csv',
                    header=0,
                    names = ["state", "year", "qtr", "hpi"])

                    #,converters={'hpi': float})

#print data.head()
#print(data.dtypes)

print(data[data.hpi == '.'])

采纳答案by ayhan

Use na.valuesparameter in read.csv:

na.valuesread.csv 中使用参数:

df = pd.read_csv('http://www.fhfa.gov/DataTools/Downloads/Documents/HPI/HPI_AT_state.csv',
                  header=0,
                  names = ["state", "year", "qtr", "hpi"], 
                  na_values='.')

df.dtypes
Out: 
state     object
year       int64
qtr        int64
hpi      float64
dtype: object

回答by danielhadar

Apply to_numericover the desired column (with apply):

to_numeric应用于所需的列(使用apply):

data.loc[data.hpi == '.', 'hpi'] = -1.0
data[['hpi']] = data[['hpi']].apply(pd.to_numeric)

For example:

例如:

In[69]: data = pd.read_csv('http://www.fhfa.gov/DataTools/Downloads/Documents/HPI/HPI_AT_state.csv',
                    header=0,
                    names = ["state", "year", "qtr", "hpi"])

In[70]: data[['hpi']].dtypes
  Out[70]: 
  hpi    object
  dtype: object

In[74]: data.loc[data.hpi == '.'] = -1.0
In[75]: data[['hpi']] = data[['hpi']].apply(pd.to_numeric)

In[77]: data[['hpi']].dtypes
Out[77]: 
hpi    float64
dtype: object

EDIT:

编辑:

For some reason it changes all the columns to float64. This is a small workaround that changes them back to int.

出于某种原因,它将所有列更改为float64. 这是一个小的解决方法,可以将它们改回int.

Before:

前:

In[89]: data.dtypes
Out[89]: 
state     object
year     float64
qtr      float64
hpi      float64

After:

后:

In[90]: data[['year','qtr']] = data[['year','qtr']].astype(int)
In[91]: data.dtypes
Out[91]: 
state     object
year       int64
qtr        int64
hpi      float64
dtype: object

If anyone could shed light over way it happens that'd be great.

如果有人能阐明它的方式,那就太好了。

回答by mgilbert

You could just cast this after you read it in. e.g.

你可以在你读进去之后投射这个。例如

data.loc[data.hpi == '.', 'hpi'] = pd.np.nan
data.hpi = data.hpi.astype(pd.np.float64)

Alternatively you can use the na_valuesparameter for read_csv

或者,您可以使用na_values参数read_csv