Python 如何将 numpy 对象数组转换为 str/unicode 数组?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16037824/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert numpy object array into str/unicode array?
提问by herrlich10
Update:In lastest version of numpy (e.g., v1.8.1), this is no longer a issue. All the methods mentioned here now work as excepted.
更新:在 numpy 的最新版本(例如 v1.8.1)中,这不再是问题。这里提到的所有方法现在都可以正常工作。
Original question:Using object dtype to store string array is convenient sometimes, especially when one needs to modify the content of a large array without prior knowledge about the maximum length of the strings, e.g.,
原问题:使用object dtype存储字符串数组有时很方便,特别是当需要修改一个大数组的内容而没有关于字符串最大长度的先验知识时,例如,
>>> import numpy as np
>>> a = np.array([u'abc', u'12345'], dtype=object)
At some point, one might want to convert the dtype back to unicode or str. However, simple conversion will truncate the string at length 4 or 1 (why?), e.g.,
在某些时候,人们可能希望将 dtype 转换回 unicode 或 str。但是,简单的转换会截断长度为 4 或 1 的字符串(为什么?),例如,
>>> b = np.array(a, dtype=unicode)
>>> b
array([u'abc', u'1234'], dtype='<U4')
>>> c = a.astype(unicode)
>>> c
array([u'a', u'1'], dtype='<U1')
Of course, one can always iterate over the entire array explicitly to determine the max length,
当然,我们总是可以显式地遍历整个数组来确定最大长度,
>>> d = np.array(a, dtype='<U{0}'.format(np.max([len(x) for x in a])))
array([u'abc', u'12345'], dtype='<U5')
Yet, this is a little bit awkward in my opinion. Is there a better way to do this?
然而,在我看来,这有点尴尬。有一个更好的方法吗?
Edit to add:According to this closely related question,
编辑补充:根据这个密切相关的问题,
>>> len(max(a, key=len))
is another way to find out the longest string length, and this step seems to be unavoidable...
是另一种找出最长字符串长度的方法,而这一步似乎是不可避免的......
采纳答案by Fred
I know this is an old question but in case anyone comes across it and is looking for an answer, try
我知道这是一个老问题,但如果有人遇到它并正在寻找答案,请尝试
c = a.astype('U')
and you should get the result you expect:
你应该得到你期望的结果:
c = array([u'abc', u'12345'], dtype='<U5')
回答by ThisGuyCantEven
At least in Python 3.5 Jupyter 4 I can use:
至少在 Python 3.5 Jupyter 4 中我可以使用:
a=np.array([u'12345',u'abc'],dtype=object)
b=a.astype(str)
b
works just fine for me and returns:
对我来说很好用并返回:
array(['12345', 'abc'],dtype='<U5')

