如何在换行符上拆分python字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24237524/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to split a python string on new line characters
提问by user1067305
In python3 in Win7 I read a web page into a string.
在Win7的python3中,我将网页读入字符串。
I then want to split the string into a list at newline characters.
然后我想在换行符处将字符串拆分为一个列表。
I can't enter the newline into my code as the argument in split(), because I get a syntax error 'EOL while scanning string literal'
我无法在我的代码中输入换行符作为 split() 中的参数,因为我收到语法错误“扫描字符串文字时的 EOL”
If I type in the characters \ and n, I get a Unicode error.
如果我输入字符 \ 和 n,则会出现 Unicode 错误。
Is there any way to do it?
有什么办法吗?
回答by laike9m
a.txt
一个.txt
this is line 1
this is line 2
code:
代码:
Python 3.4.0 (default, Mar 20 2014, 22:43:40)
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> file = open('a.txt').read()
>>> file
>>> file.split('\n')
['this is line 1', 'this is line 2', '']
I'm on Linux, but I guess you just use \r\n
on Windows and it would also work
我在 Linux 上,但我猜你只是\r\n
在 Windows 上使用它也可以
回答by Danziger
? Splitting line in Python:
? Python中的分割线:
Have you tried using str.splitlines()
method?:
您是否尝试过使用str.splitlines()
方法?:
From the docs:
从文档:
Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless
keepends
is given and true.
返回字符串中的行列表,在行边界处断开。除非
keepends
给出且为真,否则结果列表中不包含换行符。
For example:
例如:
>>> 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines()
['Line 1', '', 'Line 3', 'Line 4']
>>> 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines(True)
['Line 1\n', '\n', 'Line 3\r', 'Line 4\r\n']
Which delimiters are considered?
考虑哪些分隔符?
This method uses the universal newlines approach to splitting lines.
此方法使用通用换行方法来分割行。
The main difference between Python 2.X
and Python 3.X
is that the former uses the universal newlines approach to splitting lines, so "\r"
, "\n"
, and "\r\n"
are considered line boundaries for 8-bit strings, while the latter uses a superset of it that also includes:
的Python之间的主要区别2.X
和Python3.X
在于,前者使用通用换行符的方法来分割线,所以"\r"
,"\n"
和"\r\n"
被认为是8位串行边界,而后者的用途它的超集,其中还包括:
\v
or\x0b
: Line Tabulation (added in Python3.2
).\f
or\x0c
: Form Feed (added in Python3.2
).\x1c
: File Separator.\x1d
: Group Separator.\x1e
: Record Separator.\x85
: Next Line (C1 Control Code).\u2028
: Line Separator.\u2029
: Paragraph Separator.
\v
或\x0b
:行制表(在 Python 中添加3.2
)。\f
或\x0c
:表单馈送(在 Python 中添加3.2
)。\x1c
: 文件分隔符。\x1d
: 组分隔符。\x1e
: 记录分隔符。\x85
:下一行(C1 控制代码)。\u2028
: 行分隔符。\u2029
: 段落分隔符。
splitlines VS split:
分割线 VS 分割:
Unlike
str.split()
when a delimiter string sepis given, this method returns an empty list for the empty string, and a terminal line break does not result in an extra line:
与给定
str.split()
分隔符字符串sep时不同,此方法返回空字符串的空列表,并且终端换行不会导致额外的行:
>>> ''.splitlines()
[]
>>> 'Line 1\n'.splitlines()
['Line 1']
While str.split('\n')
returns:
虽然str.split('\n')
返回:
>>> ''.split('\n')
['']
>>> 'Line 1\n'.split('\n')
['Line 1', '']
?? Removing additional whitespace:
?? 删除额外的空格:
If you also need to remove additional leading or trailing whitespace, like spaces, that are ignored by str.splitlines()
, you could use str.splitlines()
together with str.strip()
:
如果您还需要删除额外的前导或尾随空格,例如被 忽略的空格str.splitlines()
,您可以str.splitlines()
与 一起使用str.strip()
:
>>> [str.strip() for str in 'Line 1 \n \nLine 3 \rLine 4 \r\n'.splitlines()]
['Line 1', '', 'Line 3', 'Line 4']
? Removing empty strings (''):
? 删除空字符串 (''):
Lastly, if you want to filter out the empty strings from the resulting list, you could use filter()
:
最后,如果你想从结果列表中过滤掉空字符串,你可以使用filter()
:
>>> # Python 2.X:
>>> filter(bool, 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines())
['Line 1', 'Line 3', 'Line 4']
>>> # Python 3.X:
>>> list(filter(bool, 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines()))
['Line 1', 'Line 3', 'Line 4']
Additional comment regarding the original question:
关于原始问题的补充评论:
As the error you posted indicates and Burhan suggested, the problem is from the print. There's a related question about that could be useful to you: UnicodeEncodeError: 'charmap' codec can't encode - character maps to <undefined>, print function
正如您发布的错误所表明的和 Burhan 建议的那样,问题出在印刷品上。有一个相关的问题可能对您有用:UnicodeEncodeError: 'charmap' codec can't encode - character maps to <undefined>, print function