java 如何使用java中的正则表达式捕获多行模式?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5176348/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 09:50:04  来源:igfitidea点击:

How can I capture a multiline pattern using a regular expressions in java?

javaregex

提问by lampShade

I have a text file that I need to parse using regular expressions. The text that I need to capture is in multiline groups like this:

我有一个需要使用正则表达式解析的文本文件。我需要捕获的文本在多行组中,如下所示:

truck
zDoug
Doug's house
(123) 456-7890
[email protected]
30
61234.56
8/10/2003

vehicle
eRob
Rob's house
(987) 654-3210
[email protected]

For this example I need to capture truck followed by the next seven lines.In other words, in this "block" I have 8 groups. This is what I've tried but it will not capture the next line:

对于这个例子,我需要捕获卡车,然后是接下来的七行。换句话说,在这个“块”中,我有 8 个组。这是我尝试过的,但它不会捕获下一行:

(truck)\n(\w).

NOTE: I'm using the program RegExrto test my regex before I port it to Java.

注意:在将其移植到 Java 之前,我正在使用程序RegExr来测试我的正则表达式。

采纳答案by Alan Moore

(?m)^truck(?:(?:\r\n|[\r\n]).+$)*

This assumes the whole text has been read into a single string (i.e., you're not reading a file line-by-line), but it doesn'tassume the line separator is always \n, as your code does. At the minimum you should allow for \r\nand \ras well, which is what (?:\r\n|[\r\n])does. But it still matches only oneseparator, so the match stops before the double line separator at the end of the block.

这假定整个文本已被读入单个字符串(即,您不是逐行读取文件),但它并不假定行分隔符始终为\n,就像您的代码所做的那样。至少,您应该允许\r\n并且\r也如此,这就是这样(?:\r\n|[\r\n])做的。但它仍然只匹配一个分隔符,因此匹配在块末尾的双行分隔符之前停止。

Once you've matched a block of data, you can split it on the line separators to get the individual lines. Here's an example:

一旦匹配了一个数据块,您就可以在行分隔符上将其拆分以获取各个行。下面是一个例子:

Pattern p0 = Pattern.compile("(?m)^truck(?:(?:\r\n|[\r\n]).+$)*");
Matcher m = p0.matcher(data);
while (m.find())
{
  String fullMatch = m.group();
  int n = 0;
  for (String s : fullMatch.split("\r\n|[\r\n]"))
  {
    System.out.printf("line %d: %s%n", n++, s);
  }
}

output:

输出:

line 0: truck
line 1: zDoug
line 2: Doug's house
line 3: (123) 456-7890
line 4: [email protected]
line 5: 30
line 6: 61234.56
line 7: 8/10/2003

I'm also assuming each line of data contains at least one character, and that the blank lines between data block are really empty--i.e., no spaces, TABs, or other invisible characters.

我还假设每一行数据至少包含一个字符,并且数据块之间的空行实际上是空的——即,没有空格、制表符或其他不可见字符。

(BTW: To test that regex in RegExr, remove the (?m)and check the multilinebox instead. RegExr is powered by ActionScript, so the rules are a little different. For a Java-powered regex tester, check out RegexPlanet.)

(顺便说一句:要在 RegExr 中测试该正则表达式,请删除(?m)并选中该multiline框。RegExr 由 ActionScript 提供支持,因此规则略有不同。对于Java 支持的正则表达式测试人员,请查看RegexPlanet。)

回答by Sergei

this pattern should work ((.*|\n)*)

这种模式应该有效 ((.*|\n)*)

回答by mazaneicha

I think that in order to span multiple lines your Pattern should be compiled in DOTALL mode, something like

我认为为了跨越多行你的 Pattern 应该在 DOTALL 模式下编译,比如

Pattern p = Pattern.compile("truck\n(.*\n){7}", Pattern.DOTALL);