pandas 如何在 Python3 中解码编码文字/字符串的 numpy 数组?AttributeError: 'numpy.ndarray' 对象没有属性 'decode'
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40388792/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to decode a numpy array of encoded literals/strings in Python3? AttributeError: 'numpy.ndarray' object has no attribute 'decode'
提问by ShanZhengYang
In Python 3, I have the follow NumPy
array of strings
.
在Python 3,我有如下NumPy
的阵列strings
。
Each string
in the NumPy
array is in the form b'MD18EE
instead of MD18EE
.
每个string
中NumPy
阵列的形式b'MD18EE
,而不是MD18EE
。
For example:
例如:
import numpy as np
print(array1)
(b'first_element', b'element',...)
Normally, one would use .decode('UTF-8')
to decode these elements.
通常,人们会使用.decode('UTF-8')
解码这些元素。
However, if I try:
但是,如果我尝试:
array1 = array1.decode('UTF-8')
I get the following error:
我收到以下错误:
AttributeError: 'numpy.ndarray' object has no attribute 'decode'
How do I decode these elements from a NumPy
array? (That is, I don't want b''
)
如何从NumPy
数组中解码这些元素?(也就是说,我不想b''
)
EDIT:
编辑:
Let's say I was dealing with a Pandas
DataFrame
with only certain columns that were encoded in this manner. For example:
假设我只处理以Pandas
DataFrame
这种方式编码的某些列。例如:
import pandas as pd
df = pd.DataFrame(...)
df
COL1 ....
0 b'entry1' ...
1 b'entry2'
2 b'entry3'
3 b'entry4'
4 b'entry5'
5 b'entry6'
回答by hpaulj
You have an array of bytestrings; dtype is S
:
你有一个字节串数组;数据类型是S
:
In [338]: arr=np.array((b'first_element', b'element'))
In [339]: arr
Out[339]:
array([b'first_element', b'element'],
dtype='|S13')
astype
easily converts them to unicode, the default string type for Py3.
astype
轻松将它们转换为 unicode,这是 Py3 的默认字符串类型。
In [340]: arr.astype('U13')
Out[340]:
array(['first_element', 'element'],
dtype='<U13')
There is also a library of string functions - applying the corresponding str
method to the elements of a string array
还有一个字符串函数库——将相应的str
方法应用于字符串数组的元素
In [341]: np.char.decode(arr)
Out[341]:
array(['first_element', 'element'],
dtype='<U13')
The astype
is faster, but the decode
lets you specify an encoding.
的astype
速度更快,但decode
允许您指定的编码。
See also How to decode a numpy array of dtype=numpy.string_?
回答by Wander Nauta
If you want the result to be a (Python) list of strings, you can use a list comprehension:
如果您希望结果是一个 (Python) 字符串列表,您可以使用列表理解:
>>> l = [el.decode('UTF-8') for el in array1]
>>> print(l)
['element', 'element 2']
>>> print(type(l))
<class 'list'>
Alternatively, if you want to keep it as a Numpy array, you can use np.vectorize
to make a vectorized decoder function:
或者,如果您想将其保留为 Numpy 数组,则可以使用np.vectorize
来制作矢量化解码器函数:
>>> decoder = np.vectorize(lambda x: x.decode('UTF-8'))
>>> array2 = decoder(array1)
>>> print(array2)
['element' 'element 2']
>>> print(type(array2))
<class 'numpy.ndarray'>