Javascript 测试字符串是否仅包含字母(az + é ü ? ê ? ? 等..)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2013451/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-22 22:23:39  来源:igfitidea点击:

Test if string contains only letters (a-z + é ü ? ê ? ? etc..)

javascriptregexdiacritics

提问by patad

I want to match a string to make sure it contains only letters.

我想匹配一个字符串以确保它只包含字母。

I've got this and it works just fine:

我有这个,它工作得很好:

var onlyLetters = /^[a-zA-Z]*$/.test(myString);

BUT

Since I speak another language too, I need to allow all letters, not just A-Z. Also for example:

因为我也会说另一种语言,所以我需要允许所有字母,而不仅仅是 AZ。还例如:

é ü ? ê ? ?

does anyone know if there is a global 'alpha'term that includes all letters to use with regExp? Or even better, does anyone have some kind of solution?

有谁知道是否有一个全局'alpha'术语包含所有与 regExp 一起使用的字母?或者更好的是,有人有某种解决方案吗?

Thanks alot

非常感谢

EDIT:Just realized that you might also wanna allow '-' and ' ' incase of a double name like: 'Mary-Ann' or 'Mary Ann'

编辑:刚刚意识到您可能还想允许 '-' 和 ' ' 双重名称,例如:'Mary-Ann' 或 'Mary Ann'

回答by Debilski

I don't know the actual reason for doing this, but if you want to use it as a pre-check for, say, login names oder user nicknames, I'd suggest you enter the characters yourself and don't use the whole ‘alpha' characters you'll find in unicode, because you probably won't find an optical difference in the following letters:

我不知道这样做的实际原因,但如果你想用它作为预检查,比如登录名或用户昵称,我建议你自己输入字符,不要使用整个您会在 unicode 中找到“alpha”字符,因为您可能不会在以下字母中发现光学差异:

А ≠ A ≠ Α  # cyrillic, latin, greek

In such cases it's better to specify the allowed letters manually if you want to minimise account faking and such.

在这种情况下,如果您想最大程度地减少帐户伪造等,最好手动指定允许的字母。

Addition

添加

Well, if it's for a field which is supposed to be non-unique, I would allow greek as well. I wouldn't feel well when I force users into changing their name to a latinised version.

好吧,如果它用于一个应该是非唯一的领域,我也会允许希腊语。当我强迫用户将他们的名字更改为拉丁化版本时,我会感觉不舒服。

But for unique fields like nicknames you need to give your other visitors of the site a hint, that it's really the nickname they think it is. Bad enough that people will fake accounts with interchanging I and l already. Of course, it's something that depends on your users; but to be sure I think it's better to allow basic latin + diacritics only. (Maybe have a look at this list: Latin-derived_alphabet)

但是对于像昵称这样的独特字段,您需要给站点的其他访问者一个提示,这确实是他们认为的昵称。糟糕到人们会通过交换 I 和 l 来伪造帐户。当然,这取决于您的用户;但可以肯定的是,我认为最好只允许基本的拉丁语 + 变音符号。(也许看看这个列表:Latin-derived_alphabet

As an untested suggestion (with ‘-', ‘_' and ‘ '):

作为一个未经测试的建议(带有“-”、“_”和“”):

/^[a-zA-Z\-_ ''‘?D????????T???e??????????t?????????????????????Y?????????????????????y??áà??ǎ?ā?????????????????Déè?ê?ě?ē???????????áàa?ǎ?ā?????????????????eéè?ê?ě?ē??????????????Iíì???ǐ?ī?????????????N?N?????óò??ǒ?ō???????????íìi??ǐ?ī??????????????ńn?ň???óò??ǒ?ō?????????????????????Túù?üǔ?ū???????????Y?????????????????????????túù?üǔ?ū???????????y??????????]$/.test(myString)

Another edit:I have added the apostrophe for people with names like O'Neill or O'Reilly. (And the straight and the reversed apostrophe for people who can't enter the curly one correctly.)

另一个编辑:我为名字像 O'Neill 或 O'Reilly 的人添加了撇号。(对于无法正确输入卷曲撇号的人,可以使用直撇号和反撇号。)

回答by Corey

var onlyLetters = /^[a-zA-Z\u00C0-\u00ff]+$/.test(myString)

回答by BalusC

You can't do this in JS. It has a very limited regex and normalizer support. You would need to construct a lengthy and unmaintainable character array with allpossible latin characters with diacritical marks (I guess there are around 500 different ones). Rather delegate the validation task to the server side which uses another language with more regex capabilties, if necessary with help of ajax.

你不能在 JS 中做到这一点。它对正则表达式和规范器的支持非常有限。您需要构建一个冗长且不可维护的字符数组,其中包含所有可能的带有变音符号的拉丁字符(我猜大约有 500 个不同的)。而是将验证任务委托给使用另一种具有更多正则表达式功能的语言的服务器端,如有必要,可以借助 ajax。

In a full fledged regex environment you could just test if the string matches \p{L}+. Here's a Java example:

在成熟的正则表达式环境中,您可以只测试字符串是否匹配\p{L}+。这是一个Java 示例

boolean valid = string.matches("\p{L}+");

Alternatively, you could also normailze the text to get rid of the diacritical marks and check if it contains [A-Za-z]+only. Here's again a Java example:

或者,您也可以对文本进行规范化以去除变音符号并检查它是否[A-Za-z]+仅包含。这里又是一个Java 示例

string = Normalizer.normalize(string, Form.NFD).replaceAll("\p{InCombiningDiacriticalMarks}+", "");
boolean valid = string.matches("[A-Za-z]+");

PHP supports similar functions.

PHP 支持类似的功能。

回答by Ben Y

When I tried to implement @Debilski's solution JavaScript didn't like the extended Latin characters -- I had to code them as JavaScript escapes:

当我尝试实现@Debilski 的解决方案时,JavaScript 不喜欢扩展的拉丁字符——我不得不将它们编码为 JavaScript 转义符:

// The huge unicode escape string is equal to ?D????????T???e??????????t?????????
// ????????????Y?????????????????????y??áà??ǎ?ā?????????????????Déè?ê?ě?ē???
// ????????áàa?ǎ?ā?????????????????eéè?ê?ě?ē??????????????Iíì???ǐ?ī???
// ??????????N?N?????óò??ǒ?ō???????????íìi??ǐ?ī??????????????ńn?ň?
// ??óò??ǒ?ō?????????????????????Túù?üǔ?ū???????????Y????????????????????????
// ?túù?üǔ?ū???????????y??????????

function isAlpha(string) {
    var patt = /^[a-zA-Z\u00C6\u00D0\u018E\u018F\u0190\u0194\u0132\u014A\u0152\u1E9E\u00DE\u01F7\u021C\u00E6\u00F0\u01DD\u0259\u025B\u0263\u0133\u014B\u0153\u0138\u017F\u00DF\u00FE\u01BF\u021D\u0104\u0181\u00C7\u0110\u018A\u0118\u0126\u012E\u0198\u0141\u00D8\u01A0\u015E\u0218\u0162\u021A\u0166\u0172\u01AFY\u0328\u01B3\u0105\u0253\u00E7\u0111\u0257\u0119\u0127\u012F\u0199\u0142\u00F8\u01A1\u015F\u0219\u0163\u021B\u0167\u0173\u01B0y\u0328\u01B4\u00C1\u00C0\u00C2\u00C4\u01CD\u0102\u0100\u00C3\u00C5\u01FA\u0104\u00C6\u01FC\u01E2\u0181\u0106\u010A\u0108\u010C\u00C7\u010E\u1E0C\u0110\u018A\u00D0\u00C9\u00C8\u0116\u00CA\u00CB\u011A\u0114\u0112\u0118\u1EB8\u018E\u018F\u0190\u0120\u011C\u01E6\u011E\u0122\u0194\u00E1\u00E0\u00E2\u00E4\u01CE\u0103\u0101\u00E3\u00E5\u01FB\u0105\u00E6\u01FD\u01E3\u0253\u0107\u010B\u0109\u010D\u00E7\u010F\u1E0D\u0111\u0257\u00F0\u00E9\u00E8\u0117\u00EA\u00EB\u011B\u0115\u0113\u0119\u1EB9\u01DD\u0259\u025B\u0121\u011D\u01E7\u011F\u0123\u0263\u0124\u1E24\u0126I\u00CD\u00CC\u0130\u00CE\u00CF\u01CF\u012C\u012A\u0128\u012E\u1ECA\u0132\u0134\u0136\u0198\u0139\u013B\u0141\u013D\u013F\u02BCN\u0143N\u0308\u0147\u00D1\u0145\u014A\u00D3\u00D2\u00D4\u00D6\u01D1\u014E\u014C\u00D5\u0150\u1ECC\u00D8\u01FE\u01A0\u0152\u0125\u1E25\u0127\u0131\u00ED\u00ECi\u00EE\u00EF\u01D0\u012D\u012B\u0129\u012F\u1ECB\u0133\u0135\u0137\u0199\u0138\u013A\u013C\u0142\u013E\u0140\u0149\u0144n\u0308\u0148\u00F1\u0146\u014B\u00F3\u00F2\u00F4\u00F6\u01D2\u014F\u014D\u00F5\u0151\u1ECD\u00F8\u01FF\u01A1\u0153\u0154\u0158\u0156\u015A\u015C\u0160\u015E\u0218\u1E62\u1E9E\u0164\u0162\u1E6C\u0166\u00DE\u00DA\u00D9\u00DB\u00DC\u01D3\u016C\u016A\u0168\u0170\u016E\u0172\u1EE4\u01AF\u1E82\u1E80\u0174\u1E84\u01F7\u00DD\u1EF2\u0176\u0178\u0232\u1EF8\u01B3\u0179\u017B\u017D\u1E92\u0155\u0159\u0157\u017F\u015B\u015D\u0161\u015F\u0219\u1E63\u00DF\u0165\u0163\u1E6D\u0167\u00FE\u00FA\u00F9\u00FB\u00FC\u01D4\u016D\u016B\u0169\u0171\u016F\u0173\u1EE5\u01B0\u1E83\u1E81\u0175\u1E85\u01BF\u00FD\u1EF3\u0177\u00FF\u0233\u1EF9\u01B4\u017A\u017C\u017E\u1E93]+$/;
    return patt.test(string);
}

回答by Mike Nelson

This can be tricky, unfortunately JavaScript has pretty poor support for internationalization. To do this check you'll have to create your own character class. This is because for instance, \wis the same as [0-9A-Z_a-z]which won't help you much and there isn't anything like [[:alpha:]]in Javascript. But since it sounds like you're only going to use one other langauge you can probably just add those other characters into your character class.

这可能很棘手,不幸的是 JavaScript 对国际化的支持很差。要进行此检查,您必须创建自己的角色类。这是因为,例如,\w[0-9A-Z_a-z]这对您没有太大帮助的相同,并且[[:alpha:]]在 Javascript 中没有任何类似的东西。但由于听起来您只会使用另一种语言,因此您可能只需将这些其他字符添加到您的字符类中即可。

By the way, I think you'll need a ?or *in your regexp there if myString can be longer than one character.

顺便说一句,如果 myString 可以长于一个字符,我认为您将需要?*在您的正则表达式中。

The full example,

完整的例子,

/^[a-zA-Zéü?ê??]*$/.test(myString);

/^[a-zA-Zéü?ê??]*$/.test(myString);

回答by David Pfeffer

There should be, but the regex will be localization dependent. Thus, é ü ? ê ? ?won't be filtered if you're on a US localization, for example. To ensure your web site does what you want across all localizations, you should explicitly write out the characters in a form similar to what you are already doing.

应该有,但正则表达式将取决于本地化。因此,é ü ? ê ? ?例如,如果您使用美国本地化,则不会被过滤。为了确保您的网站在所有本地化中都能满足您的需求,您应该以类似于您已经在做的形式明确地写出字符。

The only standard one I am aware of though is \w, which would match all alphanumeric characters. You could do it the "standard" way by running two regex, one to verify \wmatches and another to verify that \d(all digits) does not match, which would result in a guaranteed alpha-only string. Again, I'd strongly urge you not to use this technique as there's no guarantee what \wwill represent in a given localization, but this does answer your question.

我所知道的唯一标准是\w,它将匹配所有字母数字字符。您可以通过运行两个正则表达式来以“标准”方式执行此操作,一个用于验证\w匹配项,另一个用于验证\d(所有数字)不匹配,这将导致有保证的纯字母字符串。同样,我强烈建议您不要使用这种技术,因为无法保证\w在给定的本地化中代表什么,但这确实回答了您的问题。

回答by Virgil Dupras

I don't know anything about Javascript, but if it has proper unicode support, convert your string to a decomposed form, then remove the diacritics from it ([\u0300-\u036f\u1dc0-\u1dff]). Then your letters will only be ASCII ones.

我对 Javascript 一无所知,但如果它有适当的 unicode 支持,请将您的字符串转换为分解形式,然后从中删除变音符号 ( [\u0300-\u036f\u1dc0-\u1dff])。那么你的字母只会是 ASCII 字母。

回答by Hazior

You could aways use a blacklist instead of a whitelist. That way you only remove the characters you do not need.

您可以使用黑名单而不是白名单。这样,您只需删除不需要的字符。

回答by Frunsi

You could use a blacklist - a list of characters to exclude.

您可以使用黑名单 - 要排除的字符列表。

Also, it is important to verify the input on server-side, not only on client-side! Client-side can be bypassed easily.

此外,重要的是要验证服务器端的输入,而不仅仅是客户端!可以轻松绕过客户端。

回答by David M

There are some shortcuts to achive this in other regular expression dialects - see this page. But I don't believe there are any standardised ones in JavaScript - certainly not that would be supported by all browsers.

有一些捷径可以在其他正则表达式方言中实现这一点 - 请参阅此页面。但我不相信 JavaScript 中有任何标准化的东西——当然不是所有浏览器都支持的。