pandas 如何在 Python3 中解码编码文字/字符串的 numpy 数组？AttributeError: 'numpy.ndarray' 对象没有属性 'decode'

Question

提问by ShanZhengYang

In Python 3, I have the follow NumPyarray of strings.

在Python 3，我有如下NumPy的阵列strings。

Each stringin the NumPyarray is in the form b'MD18EEinstead of MD18EE.

每个string中NumPy阵列的形式b'MD18EE，而不是MD18EE。

For example:

例如：

import numpy as np
print(array1)
(b'first_element', b'element',...)

Normally, one would use .decode('UTF-8')to decode these elements.

通常，人们会使用.decode('UTF-8')解码这些元素。

However, if I try:

但是，如果我尝试：

array1 = array1.decode('UTF-8')

I get the following error:

我收到以下错误：

AttributeError: 'numpy.ndarray' object has no attribute 'decode'

How do I decode these elements from a NumPyarray? (That is, I don't want b'')

如何从NumPy数组中解码这些元素？（也就是说，我不想b''）

EDIT:

编辑：

Let's say I was dealing with a PandasDataFramewith only certain columns that were encoded in this manner. For example:

假设我只处理以PandasDataFrame这种方式编码的某些列。例如：

import pandas as pd
df = pd.DataFrame(...)

df
        COL1          ....
0   b'entry1'         ...
1   b'entry2'
2   b'entry3'
3   b'entry4'
4   b'entry5'
5   b'entry6'

Answer 1

回答by hpaulj

You have an array of bytestrings; dtype is S:

你有一个字节串数组；数据类型是S：

In [338]: arr=np.array((b'first_element', b'element'))
In [339]: arr
Out[339]: 
array([b'first_element', b'element'], 
      dtype='|S13')

astypeeasily converts them to unicode, the default string type for Py3.

astype轻松将它们转换为 unicode，这是 Py3 的默认字符串类型。

In [340]: arr.astype('U13')
Out[340]: 
array(['first_element', 'element'], 
      dtype='<U13')

There is also a library of string functions - applying the corresponding strmethod to the elements of a string array

还有一个字符串函数库——将相应的str方法应用于字符串数组的元素

In [341]: np.char.decode(arr)
Out[341]: 
array(['first_element', 'element'], 
      dtype='<U13')

The astypeis faster, but the decodelets you specify an encoding.

的astype速度更快，但decode允许您指定的编码。

See also How to decode a numpy array of dtype=numpy.string_?

另请参阅如何解码 dtype=numpy.string_ 的 numpy 数组？

Answer 2

回答by Wander Nauta

If you want the result to be a (Python) list of strings, you can use a list comprehension:

如果您希望结果是一个 (Python) 字符串列表，您可以使用列表理解：

>>> l = [el.decode('UTF-8') for el in array1]
>>> print(l)
['element', 'element 2']
>>> print(type(l))
<class 'list'>

Alternatively, if you want to keep it as a Numpy array, you can use np.vectorizeto make a vectorized decoder function:

或者，如果您想将其保留为 Numpy 数组，则可以使用np.vectorize来制作矢量化解码器函数：

>>> decoder = np.vectorize(lambda x: x.decode('UTF-8'))
>>> array2 = decoder(array1)
>>> print(array2)
['element' 'element 2']
>>> print(type(array2))
<class 'numpy.ndarray'>

pandas 如何在 Python3 中解码编码文字/字符串的 numpy 数组？AttributeError: 'numpy.ndarray' 对象没有属性 'decode'

提问by ShanZhengYang

回答by hpaulj

回答by Wander Nauta

相关推荐

最近更新

标签

pandas 如何在 Python3 中解码编码文字/字符串的 numpy 数组？AttributeError: 'numpy.ndarray' 对象没有属性 'decode'

提问by ShanZhengYang

回答by hpaulj

回答by Wander Nauta

相关推荐

pandas GridSearchCV：“类型错误：‘StratifiedKFold’对象不可迭代”

Pandas - 使用 datetimeindex 对数据框进行排序

Python TypeError：无法对 <class 'pandas.core.index.Int64Index'> 进行切片停止值索引

带有 pct_change 的 Pandas groupby

相关推荐

最近更新

标签