python 2.7 string.join() 与 unicode

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14758705/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:20:29  来源:igfitidea点击:

python 2.7 string.join() with unicode

pythonunicode

提问by thkang

I have bunch of byte strings (str, not unicode, in python 2.7) containing unicode data (in utf-8encoding).

我有一堆包含 unicode 数据(在编码中)的字节字符串(在 python 2.7 中str,不是)。unicodeutf-8

I am trying to join them( by "".join(utf8_strings)or u"".join(utf8_strings)) which throws

我正试图加入他们(通过"".join(utf8_strings)u"".join(utf8_strings))抛出

UnicodeDecodeError: 'ascii' codec can't decode byte 0xec in position 0: ordinal not in range(128)`

Is there any way to make use of .join()method for non-ascii strings? sure I can concatenate them in a for loop, but that wouldn't be cost-effective.

有没有办法.join()对非ascii字符串使用方法?当然我可以在 for 循环中将它们连接起来,但这不符合成本效益。

采纳答案by Martijn Pieters

Joining byte strings using ''.join()works just fine; the error you see would onlyappear if you mixed unicodeand strobjects:

使用连接字节字符串''.join()工作得很好;你看到的错误只会在你混合unicodestr对象时出现:

>>> utf8 = [u'\u0123'.encode('utf8'), u'\u0234'.encode('utf8')]
>>> ''.join(utf8)
'\xc4\xa3\xc8\xb4'
>>> u''.join(utf8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)
>>> ''.join(utf8 + [u'unicode object'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)

The exceptions above are raised when using the Unicode value u''as the joiner, and adding a Unicode string to the list of strings to join, respectively.

当使用 Unicode 值u''作为连接器,并分别将 Unicode 字符串添加到要连接的字符串列表时,会引发上述异常。

回答by afflux

"".join(...)will work if each parameter is a str(whatever the encoding may be).

"".join(...)如果每个参数都是一个str(无论编码是什么),都会起作用。

The issue you are seeing is probably not related to the join, but the data you supply to it. Post more code so we can see what's really wrong.

您看到的问题可能与联接无关,而是与您提供给它的数据有关。发布更多代码,以便我们了解真正错误的地方。