pandas 如何在 Python3 中解码编码文字/字符串的 numpy 数组?AttributeError: 'numpy.ndarray' 对象没有属性 'decode'

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40388792/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:20:24  来源:igfitidea点击:

How to decode a numpy array of encoded literals/strings in Python3? AttributeError: 'numpy.ndarray' object has no attribute 'decode'

arrayspython-3.xpandasnumpyunicode

提问by ShanZhengYang

In Python 3, I have the follow NumPyarray of strings.

在Python 3,我有如下NumPy的阵列strings

Each stringin the NumPyarray is in the form b'MD18EEinstead of MD18EE.

每个stringNumPy阵列的形式b'MD18EE,而不是MD18EE

For example:

例如:

import numpy as np
print(array1)
(b'first_element', b'element',...)

Normally, one would use .decode('UTF-8')to decode these elements.

通常,人们会使用.decode('UTF-8')解码这些元素。

However, if I try:

但是,如果我尝试:

array1 = array1.decode('UTF-8')

I get the following error:

我收到以下错误:

AttributeError: 'numpy.ndarray' object has no attribute 'decode'

How do I decode these elements from a NumPyarray? (That is, I don't want b'')

如何从NumPy数组中解码这些元素?(也就是说,我不想b''

EDIT:

编辑:

Let's say I was dealing with a PandasDataFramewith only certain columns that were encoded in this manner. For example:

假设我只处理以PandasDataFrame这种方式编码的某些列。例如:

import pandas as pd
df = pd.DataFrame(...)

df
        COL1          ....
0   b'entry1'         ...
1   b'entry2'
2   b'entry3'
3   b'entry4'
4   b'entry5'
5   b'entry6'

回答by hpaulj

You have an array of bytestrings; dtype is S:

你有一个字节串数组;数据类型是S

In [338]: arr=np.array((b'first_element', b'element'))
In [339]: arr
Out[339]: 
array([b'first_element', b'element'], 
      dtype='|S13')

astypeeasily converts them to unicode, the default string type for Py3.

astype轻松将它们转换为 unicode,这是 Py3 的默认字符串类型。

In [340]: arr.astype('U13')
Out[340]: 
array(['first_element', 'element'], 
      dtype='<U13')

There is also a library of string functions - applying the corresponding strmethod to the elements of a string array

还有一个字符串函数库——将相应的str方法应用于字符串数组的元素

In [341]: np.char.decode(arr)
Out[341]: 
array(['first_element', 'element'], 
      dtype='<U13')

The astypeis faster, but the decodelets you specify an encoding.

astype速度更快,但decode允许您指定的编码。

See also How to decode a numpy array of dtype=numpy.string_?

另请参阅如何解码 dtype=numpy.string_ 的 numpy 数组?

回答by Wander Nauta

If you want the result to be a (Python) list of strings, you can use a list comprehension:

如果您希望结果是一个 (Python) 字符串列表,您可以使用列表理解:

>>> l = [el.decode('UTF-8') for el in array1]
>>> print(l)
['element', 'element 2']
>>> print(type(l))
<class 'list'>

Alternatively, if you want to keep it as a Numpy array, you can use np.vectorizeto make a vectorized decoder function:

或者,如果您想将其保留为 Numpy 数组,则可以使用np.vectorize来制作矢量化解码器函数:

>>> decoder = np.vectorize(lambda x: x.decode('UTF-8'))
>>> array2 = decoder(array1)
>>> print(array2)
['element' 'element 2']
>>> print(type(array2))
<class 'numpy.ndarray'>