Python RegEx匹配换行符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3734023/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:29:16  来源:igfitidea点击:

Python RegEx Matching Newline

pythonregex

提问by humira

I have the following regular expression:

我有以下正则表达式:

[0-9]{8}.*\n.*\n.*\n.*\n.*

Which I have tested in Expresso against the file I am working with and the match is sucessful.

我已经在 Expresso 中针对我正在使用的文件进行了测试,并且匹配成功。

I want to match the following:

我想匹配以下内容:

  • Reference number 8 numbers long
  • Any character, any number of times
  • New Line
  • Any character, any number of times
  • New Line
  • Any character, any number of times
  • New Line
  • Any character, any number of times
  • New Line
  • Any character, any number of times
  • 参考号 8 号长
  • 任意字符,任意次数
  • 新队
  • 任意字符,任意次数
  • 新队
  • 任意字符,任意次数
  • 新队
  • 任意字符,任意次数
  • 新队
  • 任意字符,任意次数

My python code is:

我的python代码是:

for m in re.findall('[0-9]{8}.*\n.*\n.*\n.*\n.*', l, re.DOTALL):
       print m

But no matches are produced, as said in Expresso there are 400+ matches which is what I would expect.

但是没有产生匹配,正如在 Expresso 中所说,有 400 多个匹配,这正是我所期望的。

What I am missing here?

我在这里缺少什么?

采纳答案by Tim Pietzcker

Don't use re.DOTALLor the dot will match newlines, too. Also use raw strings (r"...") for regexes:

不要使用,re.DOTALL否则点也会匹配换行符。还使用原始字符串 ( r"...") 作为正则表达式:

for m in re.findall(r'[0-9]{8}.*\n.*\n.*\n.*\n.*', l):
   print m

However, your version still should have worked (although very inefficiently) ifyou have read the entire file as binary into memory as one large string.

但是,如果您已将整个文件作为二进制文件作为一个大字符串读入内存,您的版本仍然应该可以工作(尽管效率非常低)。

So the question is, are you reading the file like this:

所以问题是,您是否像这样阅读文件:

with open("filename","rb") as myfile:
    mydata = myfile.read()
    for m in re.findall(r'[0-9]{8}.*\n.*\n.*\n.*\n.*', mydata):
        print m

Or are you working with single lines (for line in myfile:or myfile.readlines())? In that case, the regex can't work, of course.

或者您是否使用单行(for line in myfile:myfile.readlines())?在这种情况下,正则表达式当然不能工作。