Python RegEx匹配换行符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3734023/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python RegEx Matching Newline
提问by humira
I have the following regular expression:
我有以下正则表达式:
[0-9]{8}.*\n.*\n.*\n.*\n.*
Which I have tested in Expresso against the file I am working with and the match is sucessful.
我已经在 Expresso 中针对我正在使用的文件进行了测试,并且匹配成功。
I want to match the following:
我想匹配以下内容:
- Reference number 8 numbers long
- Any character, any number of times
- New Line
- Any character, any number of times
- New Line
- Any character, any number of times
- New Line
- Any character, any number of times
- New Line
- Any character, any number of times
- 参考号 8 号长
- 任意字符,任意次数
- 新队
- 任意字符,任意次数
- 新队
- 任意字符,任意次数
- 新队
- 任意字符,任意次数
- 新队
- 任意字符,任意次数
My python code is:
我的python代码是:
for m in re.findall('[0-9]{8}.*\n.*\n.*\n.*\n.*', l, re.DOTALL):
print m
But no matches are produced, as said in Expresso there are 400+ matches which is what I would expect.
但是没有产生匹配,正如在 Expresso 中所说,有 400 多个匹配,这正是我所期望的。
What I am missing here?
我在这里缺少什么?
采纳答案by Tim Pietzcker
Don't use re.DOTALLor the dot will match newlines, too. Also use raw strings (r"...") for regexes:
不要使用,re.DOTALL否则点也会匹配换行符。还使用原始字符串 ( r"...") 作为正则表达式:
for m in re.findall(r'[0-9]{8}.*\n.*\n.*\n.*\n.*', l):
print m
However, your version still should have worked (although very inefficiently) ifyou have read the entire file as binary into memory as one large string.
但是,如果您已将整个文件作为二进制文件作为一个大字符串读入内存,您的版本仍然应该可以工作(尽管效率非常低)。
So the question is, are you reading the file like this:
所以问题是,您是否像这样阅读文件:
with open("filename","rb") as myfile:
mydata = myfile.read()
for m in re.findall(r'[0-9]{8}.*\n.*\n.*\n.*\n.*', mydata):
print m
Or are you working with single lines (for line in myfile:or myfile.readlines())? In that case, the regex can't work, of course.
或者您是否使用单行(for line in myfile:或myfile.readlines())?在这种情况下,正则表达式当然不能工作。

