Java - 作为 [a-zA-z0-9]* 传递的未知字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4681289/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 07:29:49  来源:igfitidea点击:

Java - Unknown characters passing as [a-zA-z0-9]*?

javaregexspacesalphanumeric

提问by Spectraljump

I'm no expert in regex but I need to parse some input I have no control over, and make sure I filter away any strings that don't have A-z and/or 0-9.

我不是正则表达式方面的专家,但我需要解析一些我无法控制的输入,并确保过滤掉所有没有 Az 和/或 0-9 的字符串。

When I run this,

当我运行这个时,

Pattern p = Pattern.compile("^[a-zA-Z0-9]*$"); //fixed typo
if(!p.matcher(gottenData).matches())
       System.out.println(someData); //someData contains gottenData

certain spaces + an unknown symbol somehow slip through the filter (gottenData is the red rectangle): screenshot

某些空格 + 一个未知符号以某种方式从过滤器中溜走(gottenData 是红色矩形): 截屏

In case you're wondering, it DOES also display Text, it's not all like that.

如果您想知道,它也确实显示文本,但并非全部如此。

For now, I don't mind the [?] as long as it also contains some string along with it.

现在,我不介意 [?],只要它还包含一些字符串即可。

Please help.

请帮忙。

[EDIT] as far as I can tell from the (very large) input, the [?]'s are either white spaces either nothing at all; maybe there's some sort of encoding issue, also perhaps something to do with #text nodes (input is xml)

[编辑] 据我从(非常大的)输入中可以看出,[?] 要么是空白,要么根本没有;也许存在某种编码问题,也可能与#text 节点有关(输入为 xml)

回答by Mark Tozzi

The * quantifier matches "zero or more", which means it will match a string that does not contain any of the characters in your class. Try the + quantifier, which means "One or more": ^[a-zA-Z0-9]+$will match strings made up of alphanumeric characters only. ^.*[a-zA-Z0-9]+.*$will match any string containing one or more alphanumeric characters, although the leading .* will make it much slower. If you use Matcher.lookingAt()instead of Matcher.matches, it will not require a full string match and you can use the regex [a-zA-Z0-9]+.

* 量词匹配“零个或多个”,这意味着它将匹配不包含类中任何字符的字符串。试试 + 量词,意思是“一个或多个”:^[a-zA-Z0-9]+$将匹配仅由字母数字字符组成的字符串。 ^.*[a-zA-Z0-9]+.*$将匹配任何包含一个或多个字母数字字符的字符串,尽管前导 .* 会使它慢得多。如果您使用Matcher.lookingAt()代替Matcher.matches,则不需要完整的字符串匹配,您可以使用正则表达式[a-zA-Z0-9]+

回答by Mihai Toader

You have an error in your regex: instead of [a-zA-z0-9]*it should be [a-zA-Z0-9]*.

您的正则表达式中有错误:而不是[a-zA-z0-9]*它应该是[a-zA-Z0-9]*.

You don't need ^and $around the regex. Matcher.matches()always matches the complete string.

你不需要^$围绕正则表达式。 Matcher.matches()总是匹配完整的字符串。

String gottenData = "a ";
Pattern p = Pattern.compile("[a-zA-z0-9]*");
if (!p.matcher(gottenData).matches())
    System.out.println("doesn't match.");

this prints "doesn't match."

这打印 "doesn't match."

回答by Martin Jespersen

You have to change the regexp to "^[a-zA-Z0-9]*$"to ensure that you are matching the entire string

您必须将正则表达式更改为"^[a-zA-Z0-9]*$"以确保匹配整个字符串

回答by M. Jessup

The correct answer is a combination of the above answers. First I imagine your intended character match is [a-zA-Z0-9]. Note that A-z isn't as bad as you might think it include all characters in the ASCII range between A and z, which is the letters plus a few extra (specifically [,\,],^,_,`).

正确答案是以上答案的组合。首先,我想象您想要的字符匹配是 [a-zA-Z0-9]。请注意,Az 并不像您想象的那么糟糕,它包含 A 和 z 之间 ASCII 范围内的所有字符,即字母加上一些额外的字符(特别是 [,\,],^,_,`)。

A second potential problem as Martin mentioned is you may need to put in the start and end qualifiers, if you want the string to only consists of letters and numbers.

Martin 提到的第二个潜在问题是,如果您希望字符串仅包含字母和数字,则可能需要放入开始和结束限定符。

Finally you use the * operator which means 0 or more, therefore you can match 0 characters and matches will return true, so effectively your pattern will match any input. What you need is the + quantifier. So I will submit the pattern you are most likely looking for is:

最后,您使用 * 运算符,这意味着 0 个或更多,因此您可以匹配 0 个字符并且匹配项将返回 true,因此您的模式将有效地匹配任何输入。您需要的是 + 量词。所以我将提交您最有可能寻找的模式是:

^[a-zA-Z0-9]+$

^[a-zA-Z0-9]+$

回答by Manoj

Did anyone consider adding space to the regex [a-zA-Z0-9 ]*. this should match any normal text with chars, number and spaces. If you want quotes and other special chars add them to the regex too.

有没有人考虑过为 regex 增加空间[a-zA-Z0-9 ]*?这应该匹配任何带有字符、数字和空格的普通文本。如果您想要引号和其他特殊字符,请将它们也添加到正则表达式中。

You can quickly test your regex at http://www.regexplanet.com/simple/

您可以在http://www.regexplanet.com/simple/ 上快速测试您的正则表达式

回答by stoneMonkey77

Looks like it should be "a-zA-Z0-9", not "a-zA-z0-9", try correcting that...

看起来应该是“a-zA-Z0-9”,而不是“a-zA-z0-9”,请尝试更正...

回答by Rizo

You can check input value is contained string and numbers? by using regex ^[a-zA-Z0-9]*$

您可以检查输入值是否包含字符串和数字?通过使用正则表达式^[a-zA-Z0-9]*$

if your value just contained numberString than its show matchi.e, riz99, riz99z else it will show not matchi.e, 99z., riz99.z, riz99.9

如果您的值只包含 numberString 而不是它的显示匹配,即 riz99, riz99z 否则它将显示不匹配,即 99z., riz99.z, riz99.9

Example code:

示例代码:

if(e.target.value.match('^[a-zA-Z0-9]*$')){
            console.log('match')
          }
          else{
            console.log('not match')
          }
}

online working example

在线工作示例