python 替换 Unicode 字符串中的换行符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2201633/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Replace newlines in a Unicode string
提问by Hymanson Miller
I am trying to replace newline characters in a unicode string and seem to be missing some magic codes.
我正在尝试替换 unicode 字符串中的换行符,但似乎缺少一些魔术代码。
My particular example is that I am working on AppEngine and trying to put titles from HTML pages into a db.StringProperty()
in my model.
我的特定示例是我正在使用 AppEngine 并尝试将 HTML 页面中的标题放入db.StringProperty()
我的模型中。
So I do something like:
所以我做这样的事情:
link.title = unicode(page_title,"utf-8").replace('\n','').replace('\r','')
and I get:
我得到:
Property title is not multi-line
Are there other codes I should be using for the replace?
我应该使用其他代码进行替换吗?
回答by Hank Gay
Try ''.join(unicode(page_title, 'utf-8').splitlines())
. splitlines()
should let the standard library take care of all the possible crazy Unicode line breaks, and then you just join them all back together with the empty string to get a single-line version.
试试''.join(unicode(page_title, 'utf-8').splitlines())
。splitlines()
应该让标准库处理所有可能的疯狂 Unicode 换行符,然后您只需将它们与空字符串重新连接在一起以获得单行版本。
回答by Ian Clelland
Python uses these characters for splitting in unicode.splitlines()
:
Python 使用这些字符进行拆分unicode.splitlines()
:
- U+000A LINE FEED (\n)
- U+000D CARRIAGE RETURN (\r)
- U+001C FILE SEPARATOR
- U+001D GROUP SEPARATOR
- U+001E RECORD SEPARATOR
- U+0085 NEXT LINE
- U+2028 LINE SEPARATOR
- U+2029 PARAGRAPH SEPARATOR
- U+000A 换行符 (\n)
- U+000D 回车 (\r)
- U+001C 文件分隔符
- U+001D 分组分隔符
- U+001E 记录分隔符
- U+0085 下一行
- U+2028 线分隔符
- U+2029 段落分隔符
As Hank says, using splitlines()
will let Python take care of all of the details for you, but if you need to do it manually, then this should be the complete list.
正如 Hank 所说, usingsplitlines()
会让 Python 为你处理所有细节,但如果你需要手动完成,那么这应该是完整的列表。
回答by Thomas Wouters
It would be useful to print the repr()
of the page_title that is seen to be multiline, but the obvious candidate would be '\r'
.
打印repr()
被视为多行的 page_title 的会很有用,但明显的候选者是'\r'
。