Python 如何将 unicode 字符串拆分为列表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18711384/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:33:09  来源:igfitidea点击:

how to split a unicode string into list

pythonstringunicodeutf-8unicode-string

提问by PersianGulf

I have the following code:

我有以下代码:

stru = "??????????"
strlist = stru.decode("utf-8").split()
print strlist[0]

my output is :

我的输出是:

??????????

But when i use:

但是当我使用:

print strlist[1]

I get the following traceback:

我得到以下信息traceback

IndexError: list index out of range

My questionis, how can I splitmy string? Of course, remember I get my stringfrom a function, consider it's a variable?

我的问题是,我怎么能split我的string?当然,记得我是string从 a得到的function,认为它是 avariable吗?

采纳答案by chryss

The split()method by default splits on whitespace. Therefore, strlistis a list that contains the whole string in strlist[0], and one single element.

split()默认情况下,该方法在空格上拆分。因此,strlist是一个包含 中的整个字符串strlist[0]和一个元素的列表。

If you want a list with one element for each unicode codepoint you can do transform it into a list in different ways:

如果您想要一个包含每个 unicode 代码点一个元素的列表,您可以通过不同的方式将其转换为一个列表:

  • Function: list(stru.decode("utf-8"))
  • List comprension: [item for item in stru.decode("utf-8")]
  • Not convert at all. Do you really need a list? You can iterate over the unicode string just like over any other sequence type (for character in stru.decode("utf-8"):...)
  • 功能: list(stru.decode("utf-8"))
  • 列表解析: [item for item in stru.decode("utf-8")]
  • 根本不转换。你真的需要一份清单吗?您可以像遍历任何其他序列类型一样遍历 unicode 字符串 ( for character in stru.decode("utf-8"):...)

回答by Ignacio Vazquez-Abrams

  1. You don't need to.

    >>> print u"??????????"[1]
    ?
    
  2. If you still wantto...

    >>> list(u"??????????")
    [u'\u06f0', u'\u06f1', u'\u06f2', u'\u06f3', u'\u06f4', u'\u06f5', u'\u06f6', u'\u06f7', u'\u06f8', u'\u06f9']
    
  1. 你不需要。

    >>> print u"??????????"[1]
    ?
    
  2. 如果你还想...

    >>> list(u"??????????")
    [u'\u06f0', u'\u06f1', u'\u06f2', u'\u06f3', u'\u06f4', u'\u06f5', u'\u06f6', u'\u06f7', u'\u06f8', u'\u06f9']
    

回答by Roman Pekar

You can do this

你可以这样做

list(stru.decode("utf-8"))