pandas read_csv 将对象转换为浮动

Question

提问by user3529091

i'm trying to read a csv file. in one column (hpi) which should be float32 there are two records populated with a . to indicate missing values. pandas interprets the . as a character.

我正在尝试读取 csv 文件。在应该是 float32 的一列 (hpi) 中，有两条记录填充了 . 表示缺失值。pandas 解释了 . 作为一个角色。

how do force numeric on this column?

如何在此列上强制使用数字？

data = pd.read_csv('http://www.fhfa.gov/DataTools/Downloads/Documents/HPI/HPI_AT_state.csv',
                    header=0,
                    names = ["state", "year", "qtr", "hpi"])

                    #,converters={'hpi': float})

#print data.head()
#print(data.dtypes)

print(data[data.hpi == '.'])

Answer 1

采纳答案by ayhan

Use na.valuesparameter in read.csv:

na.values在read.csv 中使用参数：

df = pd.read_csv('http://www.fhfa.gov/DataTools/Downloads/Documents/HPI/HPI_AT_state.csv',
                  header=0,
                  names = ["state", "year", "qtr", "hpi"], 
                  na_values='.')

df.dtypes
Out: 
state     object
year       int64
qtr        int64
hpi      float64
dtype: object

Answer 2

回答by danielhadar

Apply to_numericover the desired column (with apply):

将to_numeric 应用于所需的列（使用apply）：

data.loc[data.hpi == '.', 'hpi'] = -1.0
data[['hpi']] = data[['hpi']].apply(pd.to_numeric)

For example:

例如：

In[69]: data = pd.read_csv('http://www.fhfa.gov/DataTools/Downloads/Documents/HPI/HPI_AT_state.csv',
                    header=0,
                    names = ["state", "year", "qtr", "hpi"])

In[70]: data[['hpi']].dtypes
  Out[70]: 
  hpi    object
  dtype: object

In[74]: data.loc[data.hpi == '.'] = -1.0
In[75]: data[['hpi']] = data[['hpi']].apply(pd.to_numeric)

In[77]: data[['hpi']].dtypes
Out[77]: 
hpi    float64
dtype: object

EDIT:

编辑：

For some reason it changes all the columns to float64. This is a small workaround that changes them back to int.

出于某种原因，它将所有列更改为float64. 这是一个小的解决方法，可以将它们改回int.

Before:

前：

In[89]: data.dtypes
Out[89]: 
state     object
year     float64
qtr      float64
hpi      float64

After:

后：

In[90]: data[['year','qtr']] = data[['year','qtr']].astype(int)
In[91]: data.dtypes
Out[91]: 
state     object
year       int64
qtr        int64
hpi      float64
dtype: object

If anyone could shed light over way it happens that'd be great.

如果有人能阐明它的方式，那就太好了。

Answer 3

回答by mgilbert

You could just cast this after you read it in. e.g.

你可以在你读进去之后投射这个。例如

data.loc[data.hpi == '.', 'hpi'] = pd.np.nan
data.hpi = data.hpi.astype(pd.np.float64)

Alternatively you can use the na_valuesparameter for read_csv

或者，您可以使用na_values参数read_csv

pandas read_csv 将对象转换为浮动

提问by user3529091

采纳答案by ayhan

回答by danielhadar

回答by mgilbert

相关推荐

最近更新

标签

pandas read_csv 将对象转换为浮动

提问by user3529091

采纳答案by ayhan

回答by danielhadar

回答by mgilbert

相关推荐

在 Pandas 直方图中设置 y 轴限制

pandas 在熊猫中设置联合

pandas 如果包含一个空格，熊猫将名称列拆分为名字和姓氏

删除 Pandas 中的重复项，不包括一列

相关推荐

最近更新

标签