使用java正则表达式读取文本文件以匹配多个模式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29454663/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 08:02:24  来源:igfitidea点击:

Using java regex read a text file to match multiple patterns

javaregex

提问by Soumyasree Biswas

Code that i tried:

我试过的代码:

import java.io.*;
import java.util.regex.*;
public class All {
    public static void main(String[] args) {
        String input = "IT&&faculty.*";
        try {
            FileInputStream fstream = new FileInputStream("uu.txt");
            DataInputStream in = new DataInputStream(fstream);
            BufferedReader br = new BufferedReader(new InputStreamReader(in));
            String strLine;
            while ((strLine = br.readLine()) != null) {
                if (Pattern.matches(input, strLine)) {
                    Pattern p = Pattern.compile("'(.*?)'");
                    Matcher m = p.matcher(strLine);
                    while (m.find()) {
                        String b = m.group(1);
                        String c = b.toString() + ".*";
                        System.out.println(b);

                        if (Pattern.matches(c, strLine)) {
                            Pattern pat = Pattern.compile("<(.*?)>");
                            Matcher mat = pat.matcher(strLine);
                            while (mat.find()) {
                                System.out.println(m.group(1));

                            }
                        } else {
                            System.out.println("Not found");
                        }
                    }
                }
            }
        } catch (Exception e) {
            System.err.println("Error: " + e.getMessage());
        }
    }
}

The contents of my text file are: \ indicates it is a newline

我的文本文件的内容是:\ 表示它是一个换行符

Input file:

输入文件:

IT&&faculty('Mousum handique'|'Abhijit biswas'|'Arnab paul'|'Bhagaban swain')
 Mousum handique(designation|address|phone number|'IT Assistant          professor'|<AUS staff quaters>|#5566778899#)
 Abhijit biswas(designation|address|phone number|'IT Assistant professor'|<AUW staff quaters>|#5566778891#)
Arnab paul(designation|address|phone number|'IT Assistant professor'|<AUE staff quaters>|#5566778890#)
Bhagaban swain(designation|address|phone number|'IT Assistant professor'|<AUW staff quarters>|#5566778892#)

it gives result -

它给出了结果 -

Mousum handique
Not found
Abhijit Biswas
Not found 
Arnab Paul
Not found
Bhagaban swain
Not found

whereas the results i want is:

而我想要的结果是:

Mousum handique
AUS staff quaters
Abhijit Biswas
AUW staff quaters
Arnab Paul
AUE staff quaters
Bhagaban swain
AUW staff quaters

That is i want after 1st match when it gets Mousum handique from the file it should again search the file and where it gets line like Mousum handique it should print whatever within <> for that corresponding line. Please refer data of my text file to understand my question. Sorry if my question seems stupid but i m trying it a lot!

那就是我想要的第一场比赛后,当它从文件中获取 Mousu​​m handique 时,它​​应该再次搜索文件,并且在它获得像 Mousu​​m handique 这样的行的地方,它应该打印 <> 内的任何内容以获取相应的行。请参考我的文本文件的数据以了解我的问题。对不起,如果我的问题看起来很愚蠢,但我尝试了很多!

采纳答案by Avinash Raj

You don't need to use string.matchesmethod just use Pattternand Matcher classes to extract the name which was at the start of the line and also the contents between <>on the same line itself.

您不需要使用string.matches方法只使用 usePatttern和 Matcher 类来提取位于行首的名称以及<>同一行本身之间的内容。

String s =  "IT&&faculty('Mousum handique'|'Abhijit biswas'|'Arnab paul'|'Bhagaban swain')\n" + 
        " Mousum handique(designation|address|phone number|'IT Assistant           professor'|<AUS staff quaters>|#5566778899#)\n" + 
        " Abhijit biswas(designation|address|phone number|'IT Assistant professor'|<AUW staff quaters>|#5566778891#)\n" + 
        "Arnab paul(designation|address|phone number|'IT Assistant professor'|<AUE staff quaters>|#5566778890#)\n" + 
        "Bhagaban swain(designation|address|phone number|'IT Assistant professor'|<AUW staff quarters>|#5566778892#)";
Matcher m = Pattern.compile("(?m)^\s*([^\(]+)\([^\)]*\|<([^>]*)>[^\)]*\)").matcher(s);
while(m.find())
{
    System.out.println(m.group(1));
    System.out.println(m.group(2));
} 

Output:

输出:

Mousum handique
AUS staff quaters
Abhijit biswas
AUW staff quaters
Arnab paul
AUE staff quaters
Bhagaban swain
AUW staff quarters

DEMO

演示

Update:

更新:

Use this regex to get also the id number.

使用此正则表达式还可以获取 ID 号。

String s =  "IT&&faculty('Mousum handique'|'Abhijit biswas'|'Arnab 
paul'|'Bhagaban swain')\n" + 
                " Mousum handique(designation|address|phone number|'IT Assistant           professor'|<AUS staff quaters>|#5566778899#)\n" + 
                " Abhijit biswas(designation|address|phone number|'IT Assistant professor'|<AUW staff quaters>|#5566778891#)\n" + 
                "Arnab paul(designation|address|phone number|'IT Assistant professor'|<AUE staff quaters>|#5566778890#)\n" + 
                "Bhagaban swain(designation|address|phone number|'IT Assistant professor'|<AUW staff quarters>|#5566778892#)";
        Matcher m = Pattern.compile("(?m)^\s*([^\(]+)\([^\)]*\|<([^>]*)>[^\)]*\|#([^#]*)#[^\)]*\)").matcher(s);
        while(m.find())
        {
            System.out.println(m.group(1));
            System.out.println(m.group(2));
            System.out.println(m.group(3));
        }

Output:

输出:

Mousum handique
AUS staff quaters
5566778899
Abhijit biswas
AUW staff quaters
5566778891
Arnab paul
AUE staff quaters
5566778890
Bhagaban swain
AUW staff quarters
5566778892

回答by alfasin

One bug is here:

一个错误在这里:

while (mat.find()) {
    System.out.println(m.group(1)); // <-- you should use mat - not m!!!
}

Second bug is here:

第二个错误在这里:

if (Pattern.matches(c, strLine)) {

This ifis never entered since the String cis the previous match + ".*". Remove this if condition and it'll work.

if是从来没有进入,因为字符串c是以前匹配+“ .*”。删除此 if 条件,它将起作用。

Fixed code:

固定代码:

    ...
    Pattern p = Pattern.compile("'(.*?)'");
    Matcher m = p.matcher(strLine);
    while (m.find()) {
        String b = m.group(1);
        System.out.println(b);            
        Pattern pat = Pattern.compile("<(.*?)>");
        Matcher mat = pat.matcher(strLine);
        while (mat.find()) {
            System.out.println(mat.group(1));

        }            
    }
    ... 

Running this code with the input:

使用输入运行此代码:

"Abhijit biswas(designation|address|phone number|'IT Assistant professor'|<AUW staff quaters>|#5566778891#)

outputs:

输出:

IT Assistant professor
AUW staff quaters