Python 如何将 unicode 字符串拆分为列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18711384/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to split a unicode string into list
提问by PersianGulf
I have the following code:
我有以下代码:
stru = "??????????"
strlist = stru.decode("utf-8").split()
print strlist[0]
my output is :
我的输出是:
??????????
But when i use:
但是当我使用:
print strlist[1]
I get the following traceback:
我得到以下信息traceback:
IndexError: list index out of range
My questionis, how can I splitmy string? Of course, remember I get my stringfrom a function, consider it's a variable?
我的问题是,我怎么能split我的string?当然,记得我是string从 a得到的function,认为它是 avariable吗?
采纳答案by chryss
The split()method by default splits on whitespace. Therefore, strlistis a list that contains the whole string in strlist[0], and one single element.
split()默认情况下,该方法在空格上拆分。因此,strlist是一个包含 中的整个字符串strlist[0]和一个元素的列表。
If you want a list with one element for each unicode codepoint you can do transform it into a list in different ways:
如果您想要一个包含每个 unicode 代码点一个元素的列表,您可以通过不同的方式将其转换为一个列表:
- Function:
list(stru.decode("utf-8")) - List comprension:
[item for item in stru.decode("utf-8")] - Not convert at all. Do you really need a list? You can iterate over the unicode string just like over any other sequence type (
for character in stru.decode("utf-8"):...)
- 功能:
list(stru.decode("utf-8")) - 列表解析:
[item for item in stru.decode("utf-8")] - 根本不转换。你真的需要一份清单吗?您可以像遍历任何其他序列类型一样遍历 unicode 字符串 (
for character in stru.decode("utf-8"):...)
回答by Ignacio Vazquez-Abrams
You don't need to.
>>> print u"??????????"[1] ?If you still wantto...
>>> list(u"??????????") [u'\u06f0', u'\u06f1', u'\u06f2', u'\u06f3', u'\u06f4', u'\u06f5', u'\u06f6', u'\u06f7', u'\u06f8', u'\u06f9']
你不需要。
>>> print u"??????????"[1] ?如果你还想...
>>> list(u"??????????") [u'\u06f0', u'\u06f1', u'\u06f2', u'\u06f3', u'\u06f4', u'\u06f5', u'\u06f6', u'\u06f7', u'\u06f8', u'\u06f9']
回答by Roman Pekar
You can do this
你可以这样做
list(stru.decode("utf-8"))

