如何在 Javascript 和 PHP 中验证非英语 (UTF-8) 编码的电子邮件地址?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5219848/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-25 16:24:30  来源:igfitidea点击:

How to validate non-english (UTF-8) encoded email address in Javascript and PHP?

phpjavascriptemailutf-8internationalization

提问by Laraveldeep

Part of a website I am currently working on contains registration process where users have to provide their email address. Just recently I became aware that non-ascii based domains are possible (so is email). My backend is utf-8 encoded MySQL where I am expecting any users (with differnt locales) should be able to enter their email but don't know how to validate this kind of email address.

我目前正在开发的网站的一部分包含注册过程,用户必须提供他们的电子邮件地址。就在最近,我意识到基于非 ASCII 的域是可能的(电子邮件也是如此)。我的后端是 utf-8 编码的 MySQL,我希望任何用户(具有不同语言环境)都应该能够输入他们的电子邮件,但不知道如何验证这种电子邮件地址。

Currently I am testing out jquery tools and it validates the english email address correctly but fails to validate non ascii email. Also I need to do same at server side with php. Is there a regular expression that can validate this kind of email address?

目前我正在测试 jquery 工具,它可以正确验证英文电子邮件地址,但无法验证非 ascii 电子邮件。我也需要在服务器端用 php 做同样的事情。是否有可以验证这种电子邮件地址的正则表达式?

I have tried this but it fails in jquery tools (this is just example for demo, I don't understand this too)

我试过这个,但它在 jquery 工具中失败了(这只是演示的例子,我也不明白这一点)

闪闪发光@闪闪发光.com

闪耀发光@闪耀发光.com

Also what will happen when they type their English email address ([email protected]) with their own IME. Can this be validated with current regular expression we have for English mail validation. Currently I don't have to worry if that email exist for not.

此外,当他们使用自己的 IME 输入英文电子邮件地址 ([email protected]) 时会发生什么。这可以使用我们用于英语邮件验证的当前正则表达式进行验证。目前我不必担心该电子邮件是否存在。

Thanks

谢谢

回答by Facebook Staff are Complicit

Attempting to validate email addresses may not be a good idea. The specifications (RFC5321, RFC5322) allow for so much flexibility that validating them with regular expressions is literally impossible, and validating with a function is a great deal of work. The result of this is that most email validation schemes end up rejecting a large number of valid email addresses, much to the inconvenience of the users. (By far the most common example of this is not allowing the +character.)

尝试验证电子邮件地址可能不是一个好主意。规范(RFC5321RFC5322)提供了很大的灵活性,以至于用正则表达式验证它们实际上是不可能的,而用函数验证则需要大量的工作。这样做的结果是,大多数电子邮件验证方案最终都会拒绝大量有效的电子邮件地址,这给用户带来了很大的不便。(到目前为止,最常见的例子是不允许使用该+字符。)

It is more likely that the user will (accidentally or deliberately) enter an incorrect email address than in an invalid one, so actually validating is a great deal of work for very little benefit, with possible costs if you do it incorrectly.

用户更可能(无意或故意)输入错误的电子邮件地址而不是无效的电子邮件地址,因此实际验证是一项大量工作,但收益很少,如果您输入错误,可能会产生成本。

I would recommend that you just check for the presence of an @character on the client and then send a confirmation email to verify it; it's the most practical way to validate and it confirms that the address is correct as well.

我建议您只检查@客户端上是否存在字符,然后发送确认电子邮件进行验证;这是最实用的验证方式,它也确认地址是正确的。

回答by powtac

Since 5.2 PHP has a build in validation for email addresses. But I'm not sure if it works for UFT-8 encoded strings:

自 5.2 PHP 有一个内置的电子邮件地址验证。但我不确定它是否适用于 UFT-8 编码的字符串:

echo filter_var($email, FILTER_VALIDATE_EMAIL);

In the original PHP source codeyou will find the reg exp for validating email, this can be used for manually validating when using PHP < 5.2.

原始 PHP 源代码中,您将找到用于验证电子邮件的 reg exp,当使用 PHP < 5.2 时,可用于手动验证。

Update

更新

idn_to_ascii()can be used to "Convert domain name to IDNA ASCII form." Which then can be validated with filter_var($email, FILTER_VALIDATE_EMAIL);

idn_to_ascii()可用于“将域名转换为 IDNA ASCII 形式”。然后可以验证filter_var($email, FILTER_VALIDATE_EMAIL);

// International domains
if (function_exists('idn_to_ascii') && strpos($email, '@') !== false) {
    $parts = explode('@', $email);
    $email = $parts[0].'@'.idn_to_ascii($parts[1]);
}
$is_valid = filter_var($email, FILTER_VALIDATE_EMAIL);

回答by Ilia Rostovtsev

As offered by Mario, playing around a bit, I came up with the following regex to validate non-standard email address:

正如Mario提供的那样,我玩了一下,想出了以下正则表达式来验证非标准电子邮件地址:

^([\p{L}\_\.\-\d]+)@([\p{L}\-\.\d]+)((\.(\p{L}){2,63})+)$

It would validate any proper email address with all kind of Unicode letters, with TLD limitations from 2 to 63 characters.

它将使用各种 Unicode 字母验证任何正确的电子邮件地址,TLD 限制为 2 到 63 个字符。

Please check it and let me know if there are any flaws.

请检查一下,如果有任何缺陷,请告诉我。

Example Online

在线示例

回答by Laraveldeep

Got this idea from Javascript tutorial page. It is basic but it works for me without worrying about complexity of regular expressions and unicode standards.

Javascript 教程页面得到这个想法。它是基本的,但它对我有用,而不必担心正则表达式和 unicode 标准的复杂性。

Client side validation

客户端验证

if(!$.trim(value).length) {
    return false;
}
else {

    AtPos = value.indexOf("@");
    StopPos = value.lastIndexOf(".");

    if (AtPos == -1 || StopPos == -1) {
        return false;
    }

    if (StopPos < AtPos) {
        return false;
    }

    if (StopPos - AtPos == 1) {
        return false;
    }

    return true;
}

Serverside validation

服务器端验证

if(!isset($_POST['emailaddr']) || trim($_POST['emailaddr']) == "") {
    //Error: Email required
}
else {
    $atpos = strpos($_POST['emailaddr'],'@');
    $stoppos = strpos($_POST['emailaddr'],'.');

    if(($atpos === false) || ($stoppos === false)) {
        //Error: invalid email
    }
    else {
        if($stoppos < $atpos) {
            //Error: invalid email
        }
        else {
            if (($stoppos-$atpos) == 1) {
            //Error: invalid email
        }
    }
}

Though it still has some loop holes, I guess users will not be fooling around with this stuff. Also real validation is requierd for serious stuff as suggested by 'Jeremy Banks'.

虽然它仍然有一些漏洞,但我想用户不会在这些东西上鬼混。正如“杰里米班克斯”所建议的那样,对于严肃的事情也需要真正的验证。

Hope this will be helpful for somebody else too.

希望这对其他人也有帮助。

Thanks and regards to all

感谢并问候所有人

回答by powtac

a reg exp could be something like this:

reg exp 可能是这样的:

[^ ]+@[^ ]+\.[^ ]{2,6}

回答by Synchro

On this subject I liked this pageso much that I set up a blog exposing sites that do validation wrong(contributions gratefully received - don't let yours be on it!).

在这个主题上,我非常喜欢这个页面,以至于我建立了一个博客来公开那些验证错误的网站(感谢收到的贡献 - 不要让你的参与其中!)。

As far as using regexes go, those that say "it's wrong", tend to be light on alternatives, and TBH validation to the last letter of the RFC isn't really that critical - for example while noddy+!#$%&'*-/=?+_{}|[email protected]is a perfectly valid address, it's not too unreasonable to reject it given that a surprisingly large proportion of users can't even type 'hotmail' correctly. Some domains are also quite restrictive on user names anyway, particularly hotmail. So I'm in favour of regexes that are demonstrably reasonable, and my favourite source for that is this page, though I don't like their current JS 'winner' and it would help if they set up a public test page.

就使用正则表达式而言,那些说“这是错误的”的人往往对替代方案不感兴趣,并且对 RFC 最后一个字母的 TBH 验证并不是那么重要——例如,虽然noddy+!#$%&'*-/=?+_{}|[email protected]是一个完全有效的地址,但它不是拒绝它太不合理了,因为有很大比例的用户甚至无法正确输入“hotmail”。无论如何,某些域对用户名也有相当的限制,尤其是 hotmail。所以我赞成明显合理的正则表达式,我最喜欢的来源是这个页面,尽管我不喜欢他们目前的 JS '赢家',如果他们设置一个公共测试页面会有所帮助。

jQuery's validate pluginuses this regexwhich is interestingly constructed, quite similar in style (but smaller!) to the ex-parrot one (actually my ISP!) linked by @powtac .

jQuery 的验证插件使用了这个构造有趣的正则表达式,与 @powtac 链接的 ex-parrot插件(实际上是我的 ISP!)在风格上非常相似(但更小!)。

回答by The Bndr

what is about something this:

这是什么东西:

mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");
mb_ereg('[\w]+@[\w]+\.com',$mail,'UTF-8');