在 Java 中解析字符串有哪些不同的方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2968/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 23:31:16  来源:igfitidea点击:

What are the different methods to parse strings in Java?

javastringparsing

提问by agweber

For parsing player commands, I've most often used the splitmethod to split a string by delimiters and then to then just figure out the rest by a series of ifs or switches. What are some different ways of parsing strings in Java?

对于解析播放器命令,我最常使用split方法通过分隔符拆分字符串,然后通过一系列ifs 或switches找出其余部分。在 Java 中解析字符串有哪些不同的方法?

采纳答案by andrewrk

I assume you're trying to make the command interface as forgiving as possible. If this is the case, I suggest you use an algorithm similar to this:

我假设您正在尝试使命令界面尽可能宽容。如果是这种情况,我建议您使用与此类似的算法:

  1. Read in the string
    • Split the string into tokens
    • Use a dictionary to convert synonyms to a common form
    • For example, convert "hit", "punch", "strike", and "kick" all to "hit"
    • Perform actions on an unordered, inclusive base
    • Unordered- "punch the monkey in the face" is the same thing as "the face in the monkey punch"
    • Inclusive- If the command is supposed to be "punch the monkey in the face" and they supply "punch monkey", you should check how many commands this matches. If only one command, do this action. It might even be a good idea to have command priorities, and even if there were even matches, it would perform the top action.
  1. 读入字符串
    • 将字符串拆分为标记
    • 使用字典将同义词转换为常见形式
    • 例如,将“hit”、“punch”、“strike”和“kick”全部转换为“hit”
    • 在无序、包容的基础上执行操作
    • 无序——“打猴子的脸”和“打猴子的脸”是一回事
    • 包容性- 如果命令应该是“打猴子的脸”并且他们提供“打猴子”,您应该检查这匹配了多少命令。如果只有一个命令,请执行此操作。设置命令优先级甚至可能是一个好主意,即使有匹配项,它也会执行最高操作。

回答by Daniel Broekman

I really like regular expressions. As long as the command strings are fairly simple, you can write a few regexes that could take a few pages of code to manually parse.

我真的很喜欢正则表达式。只要命令字符串相当简单,您就可以编写一些可能需要几页代码来手动解析的正则表达式。

I would suggest you check out http://www.regular-expressions.infofor a good intro to regexes, as well as specific examples for Java.

我建议您查看http://www.regular-expressions.info以获得正则表达式的良好介绍,以及 Java 的具体示例。

回答by Mike Stone

A simple string tokenizer on spaces should work, but there are really many ways you could do this.

一个简单的空格字符串标记器应该可以工作,但实际上有很多方法可以做到这一点。

Here is an example using a tokenizer:

下面是一个使用分词器的例子:

String command = "kick person";
StringTokenizer tokens = new StringTokenizer(command);
String action = null;

if (tokens.hasMoreTokens()) {
    action = tokens.nextToken();
}

if (action != null) {
    doCommand(action, tokens);
}

Then tokens can be further used for the arguments. This all assumes no spaces are used in the arguments... so you might want to roll your own simple parsing mechanism (like getting the first whitespace and using text before as the action, or using a regular expression if you don't mind the speed hit), just abstract it out so it can be used anywhere.

然后令牌可以进一步用于参数。这一切都假设参数中没有使用空格......所以你可能想要推出你自己的简单解析机制(比如获取第一个空格并使用之前的文本作为操作,或者如果你不介意使用正则表达式速度命中),只需将其抽象出来即可在任何地方使用。

回答by Mike Stone

I would look at Java migrationsof Zork, and lean towards a simple Natural Language Processor(driven either by tokenizing or regex) such as the following (from this link):

我想看看Java的迁移魔域,并朝着一个简单的瘦自然语言处理器(通过标记化或正则表达式或者驱动)如(此链接)以下内容:

    public static boolean simpleNLP( String inputline, String keywords[])
    {
        int i;
        int maxToken = keywords.length;
        int to,from;
        if( inputline.length() = inputline.length()) return false; // check for blank and empty lines
        while( to >=0 )
        {
            to = inputline.indexOf(' ',from);
            if( to > 0){
                lexed.addElement(inputline.substring(from,to));
                from = to;
                while( inputline.charAt(from) == ' '
                && from = keywords.length) { status = true; break;}
            }
        }
        return status;
    }

...

...

Anything which gives a programmer a reason to look at Zork again is good in my book, just watch out for Grues.

任何让程序员有理由再次关注 Zork 的东西在我的书中都是好的,只要注意 Grues。

...

...

回答by svrist

@CodingTheWheel Heres your code, a bit clean up and through eclipse (ctrl+shift+f) and the inserted back here :)

@CodingTheWheel继承人你的代码,有点清理,并通过蚀(ctrl+ shift+ f)和插回到这里:)

Including the four spaces in front each line.

包括每行前面的四个空格。

public static boolean simpleNLP(String inputline, String keywords[]) {
    if (inputline.length() < 1)
        return false;

    List<String> lexed = new ArrayList<String>(); 
    for (String ele : inputline.split(" ")) {
        lexed.add(ele);
    }


    boolean status = false;
    to = 0;
    for (i = 0; i < lexed.size(); i++) {
        String s = (String) lexed.get(i);
        if (s.equalsIgnoreCase(keywords[to])) {
            to++;
            if (to >= keywords.length) {
                status = true;
                break;
            }
        }
    }
    return status;
}

回答by Telcontar

When the separator String for the command is allways the same String or char (like the ";") y recomend you use the StrinkTokenizer class:

当命令的分隔符字符串总是相同的字符串或字符(如“;”)时,建议您使用 StrinkTokenizer 类:

StringTokenizer

字符串标记器

but when the separator varies or is complex y recomend you to use the regular expresions, wich can be used by the String class itself, method split, since 1.4. It uses the Pattern class from the java.util.regex package

但是当分隔符变化或复杂时,建议您使用正则表达式,从 1.4 开始,String 类本身可以使用这种方法。它使用 java.util.regex 包中的 Pattern 类

Pattern

图案

回答by bpapa

Sun itself recommends staying away from StringTokenizer and using the String.spilt method instead.

Sun 本身建议远离 StringTokenizer 并改用 String.spilt 方法。

You'll also want to look at the Pattern class.

您还需要查看 Pattern 类。

回答by Bartosz Bierkowski

Parsing manually is a lot of fun... at the beginning:)

手动解析很有趣……一开始:)

In practice if commands aren't very sophisticated you can treat them the same way as those used in command line interpreters. There's a list of libraries that you can use: http://java-source.net/open-source/command-line. I think you can start with apache commons CLIor args4j(uses annotations). They are well documented and really simple in use. They handle parsing automatically and the only thing you need to do is to read particular fields in an object.

在实践中,如果命令不是很复杂,您可以像在命令行解释器中使用的一样对待它们。有一个您可以使用的库列表:http://java-source.net/open-source/command-line。我认为您可以从apache commons CLIargs4j(使用注释)开始。它们有很好的文档记录并且使用起来非常简单。它们会自动处理解析,您唯一需要做的就是读取对象中的特定字段。

If you have more sophisticated commands, then maybe creating a formal grammar would be a better idea. There is a very good library with graphical editor, debugger and interpreter for grammars. It's called ANTLR(and the editor ANTLRWorks) and it's free:) There are also some example grammars and tutorials.

如果您有更复杂的命令,那么创建正式语法可能是一个更好的主意。有一个非常好的库,带有图形编辑器、调试器和语法解释器。它被称为ANTLR(和编辑器ANTLRWorks)并且它是免费的:) 还有一些示例语法和教程。

回答by SaM

If this is to parse command lines I would suggest using Commons Cli.

如果这是解析命令行,我建议使用Commons Cli

The Apache Commons CLI library provides an API for processing command line interfaces.

Apache Commons CLI 库提供了一个用于处理命令行界面的 API。

回答by John with waffle

Another vote for ANTLR/ANTLRWorks. If you create two versions of the file, one with the Java code for actually executing the commands, and one without (with just the grammar), then you have an executable specification of the language, which is great for testing, a boon for documentation, and a big timesaver if you ever decide to port it.

ANTLR/ANTLRWorks 的另一票。如果您创建文件的两个版本,一个包含用于实际执行命令的 Java 代码,另一个没有(仅包含语法),那么您就有了该语言的可执行规范,这对测试非常有用,对文档来说是一个福音,如果您决定移植它,则可以节省大量时间。