java 如何使用java中的正则表达式捕获多行模式？

Question

提问by lampShade

I have a text file that I need to parse using regular expressions. The text that I need to capture is in multiline groups like this:

我有一个需要使用正则表达式解析的文本文件。我需要捕获的文本在多行组中，如下所示：

truck
zDoug
Doug's house
(123) 456-7890
[email protected]
30
61234.56
8/10/2003

vehicle
eRob
Rob's house
(987) 654-3210
[email protected]

For this example I need to capture truck followed by the next seven lines.In other words, in this "block" I have 8 groups. This is what I've tried but it will not capture the next line:

对于这个例子，我需要捕获卡车，然后是接下来的七行。换句话说，在这个“块”中，我有 8 个组。这是我尝试过的，但它不会捕获下一行：

(truck)\n(\w).

NOTE: I'm using the program RegExrto test my regex before I port it to Java.

注意：在将其移植到 Java 之前，我正在使用程序RegExr来测试我的正则表达式。

Answer 1

采纳答案by Alan Moore

(?m)^truck(?:(?:\r\n|[\r\n]).+$)*

This assumes the whole text has been read into a single string (i.e., you're not reading a file line-by-line), but it doesn'tassume the line separator is always \n, as your code does. At the minimum you should allow for \r\nand \ras well, which is what (?:\r\n|[\r\n])does. But it still matches only oneseparator, so the match stops before the double line separator at the end of the block.

这假定整个文本已被读入单个字符串（即，您不是逐行读取文件），但它并不假定行分隔符始终为\n，就像您的代码所做的那样。至少，您应该允许\r\n并且\r也如此，这就是这样(?:\r\n|[\r\n])做的。但它仍然只匹配一个分隔符，因此匹配在块末尾的双行分隔符之前停止。

Once you've matched a block of data, you can split it on the line separators to get the individual lines. Here's an example:

一旦匹配了一个数据块，您就可以在行分隔符上将其拆分以获取各个行。下面是一个例子：

Pattern p0 = Pattern.compile("(?m)^truck(?:(?:\r\n|[\r\n]).+$)*");
Matcher m = p0.matcher(data);
while (m.find())
{
  String fullMatch = m.group();
  int n = 0;
  for (String s : fullMatch.split("\r\n|[\r\n]"))
  {
    System.out.printf("line %d: %s%n", n++, s);
  }
}

output:

输出：

line 0: truck
line 1: zDoug
line 2: Doug's house
line 3: (123) 456-7890
line 4: [email protected]
line 5: 30
line 6: 61234.56
line 7: 8/10/2003

I'm also assuming each line of data contains at least one character, and that the blank lines between data block are really empty--i.e., no spaces, TABs, or other invisible characters.

我还假设每一行数据至少包含一个字符，并且数据块之间的空行实际上是空的——即，没有空格、制表符或其他不可见字符。

(BTW: To test that regex in RegExr, remove the (?m)and check the multilinebox instead. RegExr is powered by ActionScript, so the rules are a little different. For a Java-powered regex tester, check out RegexPlanet.)

（顺便说一句：要在 RegExr 中测试该正则表达式，请删除(?m)并选中该multiline框。RegExr 由 ActionScript 提供支持，因此规则略有不同。对于Java 支持的正则表达式测试人员，请查看RegexPlanet。）

Answer 2

回答by Sergei

this pattern should work ((.*|\n)*)

这种模式应该有效 ((.*|\n)*)

Answer 3

回答by mazaneicha

I think that in order to span multiple lines your Pattern should be compiled in DOTALL mode, something like

我认为为了跨越多行你的 Pattern 应该在 DOTALL 模式下编译，比如

Pattern p = Pattern.compile("truck\n(.*\n){7}", Pattern.DOTALL);

java 如何使用java中的正则表达式捕获多行模式？

提问by lampShade

采纳答案by Alan Moore

回答by Sergei

回答by mazaneicha

相关推荐

最近更新

标签

java 如何使用java中的正则表达式捕获多行模式？

提问by lampShade

采纳答案by Alan Moore

回答by Sergei

回答by mazaneicha

相关推荐

java Firefox 不会将此文件下载为 CSV

java 验证输入对话框

Java：如何将二进制值的字符串转换为浮点数，反之亦然？

java 登录后重定向（GAE 上的 Spring 安全性）

相关推荐

最近更新

标签