使用 java 扫描仪解析文本文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18505212/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 08:38:26  来源:igfitidea点击:

parsing a text file using a java scanner

javajava.util.scannerbots

提问by Programatt

I am trying to create a method that parses a text file and returns a string that is the url after the colon. The text file looks as follow (it is for a bot):

我正在尝试创建一个方法来解析文本文件并返回一个字符串,该字符串是冒号后的 url。文本文件如下所示(用于机器人):

keyword:url
keyword,keyword:url

关键字:url
关键字,关键字:url

so each line consists of a keyword and a url, or multiple keywords and a url.

所以每一行由一个关键字和一个 url 组成,或者多个关键字和一个 url。

could anyone give me a bit of direction as to how to do this? Thank you.

谁能给我一些关于如何做到这一点的方向?谢谢你。

I believe I need to use a scanner but couldn't find anything on anyone wanting to do anything similar to me.

我相信我需要使用扫描仪,但在任何想要做与我类似的事情的人身上找不到任何东西。

Thank you.

谢谢你。

edit: my attempt using suggestions below. doesn't quite work. Any help would be appreciated.

编辑:我尝试使用以下建议。不太工作。任何帮助,将不胜感激。

    public static void main(String[] args) throws IOException {
    String sCurrentLine = "";
    String key = "hello";

    BufferedReader reader = new BufferedReader(
            new FileReader(("sites.txt")));
    Scanner s = new Scanner(sCurrentLine);
    while ((sCurrentLine = reader.readLine()) != null) {
        System.out.println(sCurrentLine);
        if(sCurrentLine.contains(key)){
            System.out.println(s.findInLine("http"));
        }
    }
}

output:

输出:

    hello,there:http://www.facebook.com
null
whats,up:http:/google.com

sites.txt:

   hello,there:http://www.facebook.com
whats,up:http:/google.com

回答by slanecek

Use BufferedReader, for text parsing you can use regular expresions.

使用 BufferedReader,对于文本解析,您可以使用正则表达式。

回答by PythaLye

You should use the split method:

您应该使用 split 方法:

String strCollection[] = yourScannedStr.Split(":", 2);
String extractedUrl = strCollection[1];

回答by Boris the Spider

You should read the file line by line with a BufferedReaderas you are doing, I would the recommend parsing the file using regex.

您应该在执行时使用 a 逐行阅读文件BufferedReader,我建议使用正则表达式解析文件。

The pattern

图案

(?<=:)http://[^\s]++

Will do the trick, this pattern says:

会成功,这个模式说:

  • http://
  • followed by any number of non-space characters (more than one) [^\\s]++
  • and preceded by a colon (?<=:)
  • http://
  • 后跟任意数量的非空格字符(多于一个) [^\\s]++
  • 并以冒号开头 (?<=:)

Here is a simple example using a Stringto proxy your file:

这是一个使用 aString来代理您的文件的简单示例:

public static void main(String[] args) throws Exception {
    final String file = "hello,there:http://www.facebook.com\n"
            + "whats,up:http://google.com";
    final Pattern pattern = Pattern.compile("(?<=:)http://[^\s]++");
    final Matcher m = pattern.matcher("");
    try (final BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(file.getBytes("UTF-8"))))) {
        String line;
        while ((line = bufferedReader.readLine()) != null) {
            m.reset(line);
            while (m.find()) {
                System.out.println(m.group());
            }
        }
    }
}

Output:

输出:

http://www.facebook.com
http://google.com