Python 如何将 unicode 字符串拆分为列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18711384/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to split a unicode string into list
提问by PersianGulf
I have the following code:
我有以下代码:
stru = "??????????"
strlist = stru.decode("utf-8").split()
print strlist[0]
my output is :
我的输出是:
??????????
But when i use:
但是当我使用:
print strlist[1]
I get the following traceback
:
我得到以下信息traceback
:
IndexError: list index out of range
My questionis, how can I split
my string
? Of course, remember I get my string
from a function
, consider it's a variable
?
我的问题是,我怎么能split
我的string
?当然,记得我是string
从 a得到的function
,认为它是 avariable
吗?
采纳答案by chryss
The split()
method by default splits on whitespace. Therefore, strlist
is a list that contains the whole string in strlist[0]
, and one single element.
split()
默认情况下,该方法在空格上拆分。因此,strlist
是一个包含 中的整个字符串strlist[0]
和一个元素的列表。
If you want a list with one element for each unicode codepoint you can do transform it into a list in different ways:
如果您想要一个包含每个 unicode 代码点一个元素的列表,您可以通过不同的方式将其转换为一个列表:
- Function:
list(stru.decode("utf-8"))
- List comprension:
[item for item in stru.decode("utf-8")]
- Not convert at all. Do you really need a list? You can iterate over the unicode string just like over any other sequence type (
for character in stru.decode("utf-8"):
...)
- 功能:
list(stru.decode("utf-8"))
- 列表解析:
[item for item in stru.decode("utf-8")]
- 根本不转换。你真的需要一份清单吗?您可以像遍历任何其他序列类型一样遍历 unicode 字符串 (
for character in stru.decode("utf-8"):
...)
回答by Ignacio Vazquez-Abrams
You don't need to.
>>> print u"??????????"[1] ?
If you still wantto...
>>> list(u"??????????") [u'\u06f0', u'\u06f1', u'\u06f2', u'\u06f3', u'\u06f4', u'\u06f5', u'\u06f6', u'\u06f7', u'\u06f8', u'\u06f9']
你不需要。
>>> print u"??????????"[1] ?
如果你还想...
>>> list(u"??????????") [u'\u06f0', u'\u06f1', u'\u06f2', u'\u06f3', u'\u06f4', u'\u06f5', u'\u06f6', u'\u06f7', u'\u06f8', u'\u06f9']
回答by Roman Pekar
You can do this
你可以这样做
list(stru.decode("utf-8"))