Python 如何将具有对象 dtype 的 Numpy 2D 数组转换为常规的 2D 浮点数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19459017/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert a Numpy 2D array with object dtype to a regular 2D array of floats
提问by Moe
As part of broader program I am working on, I ended up with object arrays with strings, 3D coordinates and etc all mixed. I know object arrays might not be very favorite in comparison to structured arrays but I am hoping to get around this without changing a lot of codes.
作为我正在处理的更广泛程序的一部分,我最终得到了包含字符串、3D 坐标等的对象数组。我知道与结构化数组相比,对象数组可能不是很受欢迎,但我希望在不更改大量代码的情况下解决这个问题。
Lets assume every row of my array obj_array (with N rows) has format of
让我们假设我的数组 obj_array 的每一行(有 N 行)的格式为
Single entry/object of obj_array: ['NAME',[10.0,20.0,30.0],....]
Now, I am trying to load this object array and slice the 3D coordinate chunk. Up to here, everything works fine with simply asking lets say for .
现在,我正在尝试加载这个对象数组并切片 3D 坐标块。到这里为止,一切正常,只需询问让我们说 for 。
obj_array[:,[1,2,3]]
However the result is also an object array and I will face problem as I want to form a 2D array of floats with:
然而,结果也是一个对象数组,我将面临问题,因为我想形成一个二维浮点数组:
size [N,3] of N rows and 3 entries of X,Y,Z coordinates
For now, I am looping over rows and assigning every row to a row of a destination 2D flot array to get around the problem. I am wondering if there is any better way with array conversion tools of numpy ? I tried a few things and could not get around it.
现在,我正在遍历行并将每一行分配给目标二维浮点数组的一行以解决该问题。我想知道 numpy 的数组转换工具是否有更好的方法?我尝试了几件事,但无法解决。
Centers = np.zeros([N,3])
for row in range(obj_array.shape[0]):
Centers[row,:] = obj_array[row,1]
Thanks
谢谢
采纳答案by Jaime
Nasty little problem... I have been fooling around with this toy example:
讨厌的小问题......我一直在玩这个玩具示例:
>>> arr = np.array([['one', [1, 2, 3]],['two', [4, 5, 6]]], dtype=np.object)
>>> arr
array([['one', [1, 2, 3]],
['two', [4, 5, 6]]], dtype=object)
My first guess was:
我的第一个猜测是:
>>> np.array(arr[:, 1])
array([[1, 2, 3], [4, 5, 6]], dtype=object)
But that keeps the object
dtype, so perhaps then:
但这会保留object
dtype,所以也许:
>>> np.array(arr[:, 1], dtype=np.float)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: setting an array element with a sequence.
You can normally work around this doing the following:
您通常可以通过以下方式解决此问题:
>>> np.array(arr[:, 1], dtype=[('', np.float)]*3).view(np.float).reshape(-1, 3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: expected a readable buffer object
Not here though, which was kind of puzzling. Apparently it is the fact that the objects in your array are lists that throws this off, as replacing the lists with tuples works:
但不是在这里,这有点令人费解。显然,事实是数组中的对象是列表,因此将其替换为元组是可行的:
>>> np.array([tuple(j) for j in arr[:, 1]],
... dtype=[('', np.float)]*3).view(np.float).reshape(-1, 3)
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
Since there doesn't seem to be any entirely satisfactory solution, the easiest is probably to go with:
由于似乎没有任何完全令人满意的解决方案,最简单的方法可能是:
>>> np.array(list(arr[:, 1]), dtype=np.float)
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
Although that will not be very efficient, probably better to go with something like:
虽然这不会很有效,但最好使用以下内容:
>>> np.fromiter((tuple(j) for j in arr[:, 1]), dtype=[('', np.float)]*3,
... count=len(arr)).view(np.float).reshape(-1, 3)
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
回答by CT Zhu
You may want to use structured array, so that when you need to access the names and the values independently you can easily do so. In this example, there are two data points:
您可能希望使用结构化数组,这样当您需要独立访问名称和值时,您可以轻松地这样做。在这个例子中,有两个数据点:
x = zeros(2, dtype=[('name','S10'), ('value','f4',(3,))])
x[0][0]='item1'
x[1][0]='item2'
y1=x['name']
y2=x['value']
the result:
结果:
>>> y1
array(['item1', 'item2'],
dtype='|S10')
>>> y2
array([[ 0., 0., 0.],
[ 0., 0., 0.]], dtype=float32)
See more details: http://docs.scipy.org/doc/numpy/user/basics.rec.html
查看更多详细信息:http: //docs.scipy.org/doc/numpy/user/basics.rec.html
回答by ali_m
Based on Jaime's toy example I think you can do this very simply using np.vstack()
:
基于 Jaime 的玩具示例,我认为您可以非常简单地使用np.vstack()
:
arr = np.array([['one', [1, 2, 3]],['two', [4, 5, 6]]], dtype=np.object)
float_arr = np.vstack(arr[:, 1]).astype(np.float)
This will work regardless of whether the 'numeric' elements in your object array are 1D numpy arrays, lists or tuples.
无论对象数组中的“数字”元素是一维 numpy 数组、列表还是元组,这都将起作用。
回答by Matt
This works great working on your array arr to convert from an object to an array of floats. Number processing is extremely easy after. Thanks for that last post!!!! I just modified it to include any DataFrame size:
这在您的数组 arr 上工作非常有效,可以将对象从对象转换为浮点数数组。之后的数字处理极其容易。谢谢你最后的帖子!!!!我只是修改了它以包含任何 DataFrame 大小:
float_arr = np.vstack(arr[:, :]).astype(np.float)
回答by Matt
This is way faster to just convert your object array to a NumPy float array:
arr=np.array(arr, dtype=[('O', np.float)]).astype(np.float)
- from there no looping, index it just like you'd normally do on a NumPy array. You'd have to do it in chunks though with your different datatypes arr[:, 1]
, arr[:,2]
, etc. Had the same issue with a NumPy tuple object returned from a C++ DLL function - conversion for 17M elements takes <2s.
这是将您的对象数组转换为 NumPy 浮点数组更快的方式:
arr=np.array(arr, dtype=[('O', np.float)]).astype(np.float)
- 从那里没有循环,就像您通常在 NumPy 数组上做的那样索引它。尽管使用不同的数据类型arr[:, 1]
,但您必须分块进行arr[:,2]
。从 C++ DLL 函数返回的 NumPy 元组对象存在相同的问题 - 17M 元素的转换需要 <2 秒。
回答by Pablo Ruiz Ruiz
This problem usually happens when you have a dataset with different types, usually, dates in the first column or so.
当您拥有不同类型的数据集时,通常会发生此问题,通常是第一列左右的日期。
What I use to do, is to store the date column in a different variable; and take the rest of the "X matrix of features" into X. So I have dates and X, for instance.
我用来做的是将日期列存储在不同的变量中;并将“X 特征矩阵”的其余部分放入 X。例如,我有日期和 X。
Then I apply the conversion to the X matrix as:
然后我将转换应用于 X 矩阵:
X = np.array(list(X[:,:]), dtype=np.float)
X = np.array(list(X[:,:]), dtype=np.float)
Hope to help!
希望有所帮助!