java 使用扫描仪 useDelimiter 解析文本

Question

提问by Brian

Looking to parse the following text file:
Sample text file:

希望解析以下文本文件：
示例文本文件：

<2008-10-07>text entered by user<Ted Parlor><2008-11-26>additional text entered by user<Ted Parlor>

I would like to parse the above text so that I can have three variables:

我想解析上面的文本，以便我可以有三个变量：

v1 = 2008-10-07
v2 = text entered by user
v3 = Ted Parlor
v1 = 2008-11-26
v2 = additional text entered by user
v3 = Ted Parlor

I attempted to use scanner and useDelimiter, however, I'm having issue on how to set this up to have the results as stated above. Here's my first attempt:

我尝试使用扫描仪和 useDelimiter，但是，我在如何设置它以获得如上所述的结果方面遇到了问题。这是我的第一次尝试：

import java.io.*;
import java.util.Scanner;

public class ScanNotes {
    public static void main(String[] args) throws IOException {
        Scanner s = null;
        try {
            //String regex = "(?<=\<)([^\>>*)(?=\>)";
            s = new Scanner(new BufferedReader(new FileReader("cur_notes.txt")));
            s.useDelimiter("[<]+");

            while (s.hasNext()) {
                String v1 = s.next();
                String v2= s.next();
                System.out.println("v1= " + v1 + " v2=" + v2);
            }
        } finally {
            if (s != null) {
                s.close();
            }
        }
    }
}

The results is as follows:

结果如下：

v1= 2008-10-07>text entered by user v2=Ted Parlor>

What I desire is:

我想要的是：

v1= 2008-10-07 v2=text entered by user v3=Ted Parlor
v1= 2008-11-26 v2=additional text entered by user v3=Ted Parlor

Any help that would allow me to extract all three strings separately would be greatly appreciated.

任何能让我分别提取所有三个字符串的帮助将不胜感激。

Answer 1

回答by polygenelubricants

You can use \s*[<>]\s*as delimiter. That is, any of <or >, with any preceding and following whitespaces.

您可以\s*[<>]\s*用作分隔符。即，任何<或>，带有任何前后空格。

For this to work, there must not be any <or >in the input other than the ones used to mark the date and user fields in the input (i.e. no I <3 U!!in the message).

为此，除了用于标记输入中的日期和用户字段的输入（即消息中的no ）之外，输入中不得有任何<或。>I <3 U!!

This delimiter allows empty string parts in an entry, but it also leaves empty string tokens between any two entries, so they must be discarded manually.

此分隔符允许条目中的空字符串部分，但它也会在任何两个条目之间留下空字符串标记，因此必须手动丢弃它们。

import java.util.Scanner;

public class UseDelim {
    public static void main(String[] args) {
        String content = " <2008-10-07>text entered by user <Ted Parlor>"
        + "   <2008-11-26>  additional text entered by user <Ted Parlor>"
        + "   <2008-11-28><Parlor Ted>  ";
        Scanner sc = new Scanner(content).useDelimiter("\s*[<>]\s*");
        while (sc.hasNext()) {
            System.out.printf("[%s|%s|%s]%n",
                sc.next(), sc.next(), sc.next());

            // if there's a next entry, discard the empty string token
            if (sc.hasNext()) sc.next();
        }
    }
}

This prints:

这打印：

[2008-10-07|text entered by user|Ted Parlor]
[2008-11-26|additional text entered by user|Ted Parlor]
[2008-11-28||Parlor Ted]

java 使用扫描仪 useDelimiter 解析文本

提问by Brian

回答by polygenelubricants

See also

也可以看看

相关推荐

最近更新

标签

java 使用扫描仪 useDelimiter 解析文本

提问by Brian

回答by polygenelubricants

See also

也可以看看

相关推荐

如何在不使用预制类的情况下在 Java 中创建可扩展的动态数组？

使用 Java 的文件资源管理器 - 如何进行？

java 如何编写一个接受多种类型的函数？

java Android Context.bindService 总是返回 false 并且永远不会触发 ServiceConnection 对象

相关推荐

最近更新

标签