如何在 java 中使用哈希集来确定字符串是否包含有效字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19885506/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 21:02:15  来源:igfitidea点击:

How can I use hash sets in java to determine if a string contains valid characters?

javahashset

提问by user2975289

I'm writing a lexical analyzer and have never used hash sets. I want to take a string and make sure it's legal. I think I understand how to build the hash set with valid characters but I'm not sure how to compare the string with teh hash set to ensure it contains valid characters. I can't find an example anywhere. Can someone point me to code that would do this?

我正在编写一个词法分析器并且从未使用过哈希集。我想拿一个字符串并确保它是合法的。我想我了解如何使用有效字符构建哈希集,但我不确定如何将字符串与哈希集进行比较以确保它包含有效字符。我在任何地方都找不到示例。有人可以指出我可以做到这一点的代码吗?

回答by wvdz

HashSet has the function contains() for this, since it implements the Collection interface.

HashSet 具有用于此的函数 contains(),因为它实现了 Collection 接口。

回答by dasblinkenlight

You cannot compare an entire string to a HashSet<Character>, but you can do it one character at a time:

您不能将整个字符串与 a 进行比较HashSet<Character>,但您可以一次做一个字符:

HashSet<Character> valid = new HashSet<Character>();
valid.add('a');
valid.add('d');
valid.add('f');
boolean allOk = true;
for (char c : "fad".toCharArray()) {
    if (!valid.contains(c)) {
        allOk = false;
        break;
    }
}
System.out.println(allOk);

However, this is not the most efficient way of doing it. A better approach would be to construct a regex with the characters that you need, and call match()on the string:

然而,这并不是最有效的方法。更好的方法是使用您需要的字符构造一个正则表达式,然后调用match()字符串:

// Let's say x, y, and z are the valid characters
String regex = "[xyz]*";
if (myString.matches(regex)) {
    System.out.println("All characters in the string are in 'x', 'y', and 'z'");
}

回答by Stephen C

I think you are probably over-thinking this problem. (For instance, spending too much time thinking how to make the lexer "efficient" ...)

我想你可能把这个问题想得太多了。(例如,花太多时间思考如何使词法分析器“高效”......)

The conventional ways to test for valid / invalid characters in a lexer are:

在词法分析器中测试有效/无效字符的常规方法是:

  • use a big switch statement, or

  • perform a sequence of "character class" tests; e.g. using the result of Character.getType(char)

  • 使用大的 switch 语句,或者

  • 执行一系列“字符类”测试;例如使用结果Character.getType(char)

Or better still, use a lexer generator.

或者更好的是,使用词法分析器生成器。



Using a HashSet is neither more efficient or more readable than a switch. And the "character class" approach couldbe a lot more readable than both ... depending on your validation rules.

使用 HashSet 既不比switch. 并且“字符类”方法可能比两者都更具可读性……这取决于您的验证规则。



But if I haven't convinced you, see @blinkenlights' Answer :-)

但如果我还没有说服你,请看@blinkenlights 的回答 :-)