java 使用扫描仪 useDelimiter 解析文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2935854/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parse Text using scanner useDelimiter
提问by Brian
Looking to parse the following text file:
Sample text file:
希望解析以下文本文件:
示例文本文件:
<2008-10-07>text entered by user<Ted Parlor><2008-11-26>additional text entered by user<Ted Parlor>
I would like to parse the above text so that I can have three variables:
我想解析上面的文本,以便我可以有三个变量:
v1 = 2008-10-07
v2 = text entered by user
v3 = Ted Parlor
v1 = 2008-11-26
v2 = additional text entered by user
v3 = Ted Parlor
I attempted to use scanner and useDelimiter, however, I'm having issue on how to set this up to have the results as stated above. Here's my first attempt:
我尝试使用扫描仪和 useDelimiter,但是,我在如何设置它以获得如上所述的结果方面遇到了问题。这是我的第一次尝试:
import java.io.*;
import java.util.Scanner;
public class ScanNotes {
public static void main(String[] args) throws IOException {
Scanner s = null;
try {
//String regex = "(?<=\<)([^\>>*)(?=\>)";
s = new Scanner(new BufferedReader(new FileReader("cur_notes.txt")));
s.useDelimiter("[<]+");
while (s.hasNext()) {
String v1 = s.next();
String v2= s.next();
System.out.println("v1= " + v1 + " v2=" + v2);
}
} finally {
if (s != null) {
s.close();
}
}
}
}
The results is as follows:
结果如下:
v1= 2008-10-07>text entered by user v2=Ted Parlor>
What I desire is:
我想要的是:
v1= 2008-10-07 v2=text entered by user v3=Ted Parlor
v1= 2008-11-26 v2=additional text entered by user v3=Ted Parlor
Any help that would allow me to extract all three strings separately would be greatly appreciated.
任何能让我分别提取所有三个字符串的帮助将不胜感激。
回答by polygenelubricants
You can use \s*[<>]\s*as delimiter. That is, any of <or >, with any preceding and following whitespaces.
您可以\s*[<>]\s*用作分隔符。即,任何<或>,带有任何前后空格。
For this to work, there must not be any <or >in the input other than the ones used to mark the date and user fields in the input (i.e. no I <3 U!!in the message).
为此,除了用于标记输入中的日期和用户字段的输入(即消息中的no )之外,输入中不得有任何<或。>I <3 U!!
This delimiter allows empty string parts in an entry, but it also leaves empty string tokens between any two entries, so they must be discarded manually.
此分隔符允许条目中的空字符串部分,但它也会在任何两个条目之间留下空字符串标记,因此必须手动丢弃它们。
import java.util.Scanner;
public class UseDelim {
public static void main(String[] args) {
String content = " <2008-10-07>text entered by user <Ted Parlor>"
+ " <2008-11-26> additional text entered by user <Ted Parlor>"
+ " <2008-11-28><Parlor Ted> ";
Scanner sc = new Scanner(content).useDelimiter("\s*[<>]\s*");
while (sc.hasNext()) {
System.out.printf("[%s|%s|%s]%n",
sc.next(), sc.next(), sc.next());
// if there's a next entry, discard the empty string token
if (sc.hasNext()) sc.next();
}
}
}
This prints:
这打印:
[2008-10-07|text entered by user|Ted Parlor]
[2008-11-26|additional text entered by user|Ted Parlor]
[2008-11-28||Parlor Ted]

