Python 在 pandas 0.10.1 上使用 pandas.read_csv 指定 dtype float32

Question

提问by durden2.0

I'm attempting to read a simple space-separated file with pandas read_csvmethod. However, pandas doesn't seem to be obeying my dtypeargument. Maybe I'm incorrectly specifying it?

我正在尝试使用 Pandasread_csv方法读取一个简单的空格分隔文件。然而，pandas 似乎并不服从我的dtype论点。也许我错误地指定了它？

I've distilled down my somewhat complicated call to read_csvto this simple test case. I'm actually using the convertersargument in my 'real' scenario but I removed this for simplicity.

我已经read_csv对这个简单的测试用例进行了一些复杂的调用。我实际上converters在我的“真实”场景中使用了这个参数，但为了简单起见，我删除了它。

Below is my ipython session:

下面是我的 ipython 会话：

>>> cat test.out
a b
0.76398 0.81394
0.32136 0.91063
>>> import pandas
>>> import numpy
>>> x = pandas.read_csv('test.out', dtype={'a': numpy.float32}, delim_whitespace=True)
>>> x
         a        b
0  0.76398  0.81394
1  0.32136  0.91063
>>> x.a.dtype
dtype('float64')

I've also tried this using this with a dtypeof numpy.int32or numpy.int64. These choices result in an exception:

我也试过使用 this 和 a dtypeof numpy.int32or numpy.int64。这些选择会导致异常：

AttributeError: 'NoneType' object has no attribute 'dtype'

I'm assuming the AttributeErroris because pandas will not automatically try to convert/truncate the float values into an integer?

我假设AttributeError是因为熊猫不会自动尝试将浮点值转换/截断为整数？

I'm running on a 32-bit machine with a 32-bit version of Python.

我在带有 32 位版本 Python 的 32 位机器上运行。

>>> !uname -a
Linux ubuntu 3.0.0-13-generic #22-Ubuntu SMP Wed Nov 2 13:25:36 UTC 2011 i686 i686 i386 GNU/Linux
>>> import platform
>>> platform.architecture()
('32bit', 'ELF')
>>> pandas.__version__
'0.10.1'

Answer 1

采纳答案by Jeff

0.10.1 doesn't really support float32 very much

0.10.1 并没有真正支持 float32

see this http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#dtype-specification

看到这个http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#dtype-specification

you can do this in 0.11 like this:

你可以在 0.11 中这样做：

# dont' use dtype converters explicity for the columns you care about
# they will be converted to float64 if possible, or object if they cannot
df = pd.read_csv('test.csv'.....)

#### this is optional and related to the issue you posted ####
# force anything that is not a numeric to nan
# columns are the list of columns that you are interesetd in
df[columns] = df[columns].convert_objects(convert_numeric=True)


    # astype
    df[columns] = df[columns].astype('float32')

see http://pandas.pydata.org/pandas-docs/dev/basics.html#object-conversion

Its not as efficient as doing it directly in read_csv (but that requires
 some low-level changes)

I have confirmed that with 0.11-dev, this DOES work (on 32-bit and 64-bit, results are the same)

我已经确认使用 0.11-dev，这确实有效（在 32 位和 64 位上，结果相同）

In [5]: x = pd.read_csv(StringIO.StringIO(data), dtype={'a': np.float32}, delim_whitespace=True)

In [6]: x
Out[6]: 
         a        b
0  0.76398  0.81394
1  0.32136  0.91063

In [7]: x.dtypes
Out[7]: 
a    float32
b    float64
dtype: object

In [8]: pd.__version__
Out[8]: '0.11.0.dev-385ff82'

In [9]: quit()
vagrant@precise32:~/pandas$ uname -a
Linux precise32 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686 i686 i386 GNU/Linux

Answer 2

回答by Jeff

In [22]: df.a.dtype = pd.np.float32

In [23]: df.a.dtype
Out[23]: dtype('float32')

the above works fine for me under pandas 0.10.1

以上在熊猫0.10.1下对我来说很好用

Python 在 pandas 0.10.1 上使用 pandas.read_csv 指定 dtype float32

提问by durden2.0

采纳答案by Jeff

回答by Jeff

相关推荐

最近更新

标签

Python 在 pandas 0.10.1 上使用 pandas.read_csv 指定 dtype float32

提问by durden2.0

采纳答案by Jeff

回答by Jeff

相关推荐

Python 运行时警告：最大值遇到无效值

Python 在 sklearn 中使用 RandomForestClassifier 进行不平衡分类

Python 使用 np.savetxt 将数组保存为列

'module' 对象在使用 python 解析 JSON 时没有属性 'loads'

相关推荐

最近更新

标签