Java - 正则表达式在代码中查找注释

Question

提问by brovar

A little funwith Java this time. I want to write a program that reads a code from standard input (line by line, for example), like:

这次用Java有点乐趣。我想编写一个从标准输入（例如逐行）读取代码的程序，例如：

// some comment
class Main {
    /* blah */
    // /* foo
    foo();
    // foo */
    foo2();
    /* // foo2 */
}

finds all comments in it and removes them. I'm trying to use regular expressions, and for now I've done something like this:

找到其中的所有评论并删除它们。我正在尝试使用正则表达式，现在我已经做了这样的事情：

private static String ParseCode(String pCode)
{
    String MyCommentsRegex = "(?://.*)|(/\*(?:.|[\n\r])*?\*/)";
    return pCode.replaceAll(MyCommentsRegex, " ");
}

but it seems not to work for all the cases, e.g.:

但它似乎不适用于所有情况，例如：

System.out.print("We can use /* comments */ inside a string of course, but it shouldn't start a comment");

Any advice or ideas different from regex? Thanks in advance.

任何与正则表达式不同的建议或想法？提前致谢。

Answer 1

回答by Suraj Chandran

Another alternative is to use some library supporting AST parsing, for e.g. org.eclipse.jdt.core has all the APIs you need to do this and more. But then that's just one alternative:)

另一种替代方法是使用一些支持 AST 解析的库，例如 org.eclipse.jdt.core 具有您执行此操作所需的所有 API，等等。但这只是一种选择:)

Answer 2

回答by tangens

The last example is no problem I think:

最后一个例子我认为没有问题：

/* we comment out some code
System.out.print("We can use */ inside a string of course");
we end the comment */

... because the comment actually ends with "We can use */. This code does not compile.

...因为评论实际上以"We can use */. 此代码无法编译。

But I have another problematic case:

但我还有另一个有问题的案例：

int/*comment*/foo=3;

Your pattern will transform this into:

您的模式会将其转换为：

intfoo=3;

...what is invalid code. So better replace your comments with " "instead of "".

...什么是无效代码。所以最好用" "而不是替换你的评论""。

Answer 3

回答by alex

I think a 100% correct solution using regular expressions is either inhuman or impossible (taking into account escapes, etc.).

我认为使用正则表达式的 100% 正确解决方案要么不人道，要么不可能（考虑到转义等）。

I believe the best option would be using ANTLR- I believe they even provide a Java grammar you can use.

我相信最好的选择是使用 ANTLR——我相信他们甚至提供了你可以使用的 Java 语法。

Answer 4

回答by PSpeed

You may have already given up on this by now but I was intrigued by the problem.

你现在可能已经放弃了这个，但我对这个问题很感兴趣。

I believe this is a partial solution...

我相信这是一个部分的解决方案......

Native regex:

原生正则表达式：

//.*|("(?:\[^"]|\"|.)*?")|(?s)/\*.*?\*/

In Java:

在 Java 中：

String clean = original.replaceAll( "//.*|(\"(?:\\[^\"]|\\\"|.)*?\")|(?s)/\*.*?\*/", " " );

This appears to properly handle comments embedded in strings as well as properly escaped quotes inside strings. I threw a few things at it to check but not exhaustively.

这似乎可以正确处理字符串中嵌入的注释以及字符串中正确转义的引号。我扔了一些东西来检查，但不是详尽无遗。

There is one compromise in that all "" blocks in the code will end up with space after them. Keeping this simple and solving that problem would be very difficult given the need to cleanly handle:

有一种折衷方案，即代码中的所有 "" 块在它们之后都会有空格。考虑到需要干净地处理：

int/* some comment */foo = 5;

A simple Matcher.find/appendReplacement loop could conditionally check for group(1) before replacing with a space and would only be a handful of lines of code. Still simpler than a full up parser maybe. (I could add the matcher loop too if anyone is interested.)

一个简单的 Matcher.find/appendReplacement 循环可以在用空格替换之前有条件地检查 group(1)，并且只需要几行代码。也许比完整的解析器更简单。（如果有人感兴趣，我也可以添加匹配器循环。）

Answer 5

回答by Quadrat137

I ended up with this solution.

我最终得到了这个解决方案。

public class CommentsFun {
    static List<Match> commentMatches = new ArrayList<Match>();

    public static void main(String[] args) {
        Pattern commentsPattern = Pattern.compile("(//.*?$)|(/\*.*?\*/)", Pattern.MULTILINE | Pattern.DOTALL);
        Pattern stringsPattern = Pattern.compile("(\".*?(?<!\\)\")");

        String text = getTextFromFile("src/my/test/CommentsFun.java");

        Matcher commentsMatcher = commentsPattern.matcher(text);
        while (commentsMatcher.find()) {
            Match match = new Match();
            match.start = commentsMatcher.start();
            match.text = commentsMatcher.group();
            commentMatches.add(match);
        }

        List<Match> commentsToRemove = new ArrayList<Match>();

        Matcher stringsMatcher = stringsPattern.matcher(text);
        while (stringsMatcher.find()) {
            for (Match comment : commentMatches) {
                if (comment.start > stringsMatcher.start() && comment.start < stringsMatcher.end())
                    commentsToRemove.add(comment);
            }
        }
        for (Match comment : commentsToRemove)
            commentMatches.remove(comment);

        for (Match comment : commentMatches)
            text = text.replace(comment.text, " ");

        System.out.println(text);
    }

    //Single-line

    // "String? Nope"

    /*
    * "This  is not String either"
    */

    //Complex */
    ///*More complex*/

    /*Single line, but */

    String moreFun = " /* comment? doubt that */";

    String evenMoreFun = " // comment? doubt that ";

    static class Match {
        int start;
        String text;
    }
}

Java - 正则表达式在代码中查找注释

提问by brovar

回答by Suraj Chandran

回答by tangens

回答by alex

回答by PSpeed

回答by Quadrat137

相关推荐

最近更新

标签

Java - 正则表达式在代码中查找注释

提问by brovar

回答by Suraj Chandran

回答by tangens

回答by alex

回答by PSpeed

回答by Quadrat137

相关推荐

Java 使用 Mockito 模拟具有泛型参数的类

在 Java 中的 JFrame 上设置背景图像

Java Jtable 可以在单元格失去焦点时保存数据吗？

java.lang.UnsatisfiedLinkError: 无法加载 stlport_shared: findLibrary 返回 null (tess-two)

相关推荐

最近更新

标签