pandas 如何更改 numpy recarray 某些列的 dtype?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9949427/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to change the dtype of certain columns of a numpy recarray?
提问by mathtick
Suppose I have a recarray such as the following:
假设我有一个如下所示的重新排列:
import numpy as np
# example data from @unutbu's answer
recs = [('Bill', '31', 260.0), ('Fred', 15, '145.0')]
r = np.rec.fromrecords(recs, formats = 'S30,i2,f4', names = 'name, age, weight')
print(r)
# [('Bill', 31, 260.0) ('Fred', 15, 145.0)]
Say I want to convert certain columns to floats. How do I do this? Should I change to an ndarray and them back to a recarray?
假设我想将某些列转换为浮点数。我该怎么做呢?我应该改成 ndarray 并将它们改回 recarray 吗?
回答by unutbu
Here is an example using astypeto perform the conversion:
这是astype用于执行转换的示例:
import numpy as np
recs = [('Bill', '31', 260.0), ('Fred', 15, '145.0')]
r = np.rec.fromrecords(recs, formats = 'S30,i2,f4', names = 'name, age, weight')
print(r)
# [('Bill', 31, 260.0) ('Fred', 15, 145.0)]
The ageis of dtype <i2:
的age是D型细胞<i2:
print(r.dtype)
# [('name', '|S30'), ('age', '<i2'), ('weight', '<f4')]
We can change that to <f4using astype:
我们可以将其更改为<f4使用astype:
r = r.astype([('name', '|S30'), ('age', '<f4'), ('weight', '<f4')])
print(r)
# [('Bill', 31.0, 260.0) ('Fred', 15.0, 145.0)]
回答by mathtick
There are basically two steps. My stumbling block was in finding how to modify an existing dtype. This is how I did it:
基本上有两个步骤。我的绊脚石是找到如何修改现有的 dtype。我是这样做的:
# change dtype by making a whole new array
dt = data.dtype
dt = dt.descr # this is now a modifiable list, can't modify numpy.dtype
# change the type of the first col:
dt[0] = (dt[0][0], 'float64')
dt = numpy.dtype(dt)
# data = numpy.array(data, dtype=dt) # option 1
data = data.astype(dt)
回答by JohnE
Here is a minor refinement of the existing answers, plus an extension to situations where you want to make a change based on the dtype rather than column name (e.g. change all floats to integers).
这是对现有答案的一个小改进,加上对您想要根据 dtype 而不是列名进行更改的情况的扩展(例如,将所有浮点数更改为整数)。
First, you can improve the conciseness and readability by using a listcomp:
首先,您可以通过使用 listcomp 来提高简洁性和可读性:
col = 'age'
new_dtype = 'float64'
r.astype( [ (col, new_dtype) if d[0] == col else d for d in r.dtype.descr ] )
# rec.array([(b'Bill', 31.0, 260.0), (b'Fred', 15.0, 145.0)],
# dtype=[('name', 'S30'), ('age', '<f8'), ('weight', '<f4')])
Second, you can extend this syntax to handle cases where you want to change all floats to integers (or vice versa). For example, if you wanted to change any 32 or 64 bit float into a 64 bit integer, you could do something like:
其次,您可以扩展此语法以处理要将所有浮点数更改为整数(反之亦然)的情况。例如,如果您想将任何 32 位或 64 位浮点数更改为 64 位整数,您可以执行以下操作:
old_dtype = ['<f4', '<f8']
new_dtype = 'int64'
r.astype( [ (d[0], new_dtype) if d[1] in old_dtype else d for d in r.dtype.descr ] )
# rec.array([(b'Bill', 31, 260), (b'Fred', 15, 145)],
# dtype=[('name', 'S30'), ('age', '<i2'), ('weight', '<i8')])
Note that astypehas an optional casting argument that defaults to unsafeso you may want to specify casting='safe'to avoid accidentally losing precision when casting floats to integers:
请注意,astype有一个默认的可选强制转换参数,unsafe因此您可能需要指定casting='safe'以避免在将浮点数转换为整数时意外丢失精度:
r.astype( [ (d[0], new_dtype) if d[1] in old_dtype else d for d in r.dtype.descr ],
casting='safe' )
Refer to the numpy documentation on astypefor more on castingand other options.
有关更多信息和其他选项,请参阅astype上的numpy 文档casting。
Also note that for general cases of changing floats to integers or vice versa you might prefer to check the general number type with np.issubdtyperather than checking against multiple specific dtypes.
另请注意,对于将浮点数更改为整数或反之亦然的一般情况,您可能更喜欢检查通用数字类型np.issubdtype而不是检查多个特定的数据类型。

