php 使用正则表达式验证 Youtube URL

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13476060/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 05:31:12  来源:igfitidea点击:

Validating Youtube URL using Regex

phpregexyoutube

提问by Luke

I'm trying to validate YouTube URLs for my application.

我正在尝试验证我的应用程序的 YouTube 网址。

So far I have the following:

到目前为止,我有以下几点:

// Set the youtube URL
$youtube_url = "www.youtube.com/watch?v=vpfzjcCzdtCk";

if (preg_match("/((http\:\/\/){0,}(www\.){0,}(youtube\.com){1} || (youtu\.be){1}(\/watch\?v\=[^\s]){1})/", $youtube_url) == 1)
{
    echo "Valid";
else
{
    echo "Invalid";
}

I wish to validate the following variations of Youtube Urls:

我希望验证 Youtube Urls 的以下变体:

  • With and without http://
  • With and without www.
  • With the URLs youtube.com and youtu.be
  • Must have /watch?v=
  • Must have the unique video string (In the example above "vpfzjcCzdtCk")
  • 有和没有 http://
  • 有和没有 www。
  • 使用 URL youtube.com 和 youtu.be
  • 必须有 /watch?v=
  • 必须具有唯一的视频字符串(在上面的示例中为“vpfzjcCzdtCk”)

However, I don't think I've got my logic right, because for some reason it returns truefor: www.youtube.co/watch?v=vpfzjcCzdtCk(Notice I've written it incorrectly with .coand not .com)

但是,我认为我的逻辑不正确,因为出于某种原因,它返回true:(www.youtube.co/watch?v=vpfzjcCzdtCk注意我用.co和 not错误地编写了它.com

回答by Linus Kleen

There are a lot of redundancies in this regular expression of yours (and also, the leaning toothpick syndrome). This, though, should produce results:

你的这个正则表达式有很多冗余(还有倾斜牙签综合症)。不过,这应该会产生结果:

$rx = '~
  ^(?:https?://)?                           # Optional protocol
   (?:www[.])?                              # Optional sub-domain
   (?:youtube[.]com/watch[?]v=|youtu[.]be/) # Mandatory domain name (w/ query string in .com)
   ([^&]{11})                               # Video id of 11 characters as capture group 1
    ~x';

$has_match = preg_match($rx, $url, $matches);

// if matching succeeded, $matches[1] would contain the video ID

Some notes:

一些注意事项:

  • use the tilde character ~as delimiter, to avoid LTS
  • use [.]instead of \.to improve visual legibility and avoid LTS. ("Special" characters - such as the dot .- have no effect in character classes (within square brackets))
  • to make regular expressions more "readable" you can use the xmodifier (which has further implications; see the docs on Pattern modifiers), which also allows for comments in regular expressions
  • capturing can be suppressed using non-capturing groups: (?: <pattern> ). This makes the expression more efficient.
  • 使用波浪号~作为分隔符,以避免 LTS
  • 使用[.]而不是\.提高视觉易读性并避免 LTS。(“特殊”字符 - 例如点.- 在字符类中没有影响(在方括号内))
  • 为了使正则表达式更具“可读性”,您可以使用x修饰符(它具有进一步的含义;请参阅有关模式修饰符的文档),它还允许在正则表达式中添加注释
  • 可以使用非捕获组抑制捕获:(?: <pattern> ). 这使得表达式更有效。


Optionally, to extract values from a (more or less complete) URL, you might want to make use of parse_url():

或者,要从(或多或少完整的)URL 中提取值,您可能需要使用parse_url()

$url = 'http://youtube.com/watch?v=VIDEOID';
$parts = parse_url($url);
print_r($parts);

Output:

输出:

Array
(
    [scheme] => http
    [host] => youtube.com
    [path] => /watch
    [query] => v=VIDEOID
)

Validating the domain name and extracting the video ID is left as an exercise to the reader.

验证域名和提取视频 ID 作为练习留给读者。



I gave in to the comment war below; thanks to Toni Oriol, the regular expression now works on short (youtu.be) URLs as well.

我屈服于下面的评论战;感谢 Toni Oriol,正则表达式现在也适用于短 (youtu.be) URL。

回答by Jason McCreary

An alternative to Regular Expressions would be parse_url().

正则表达式的替代方法是parse_url().

 $parts = parse_url($url);
 if ($parts['host'] == 'youtube.com' && ...) {
   // your code
 }

While it is more code, it is more readable and therefore more maintainable.

虽然它是更多的代码,但它更具可读性,因此更易于维护。

回答by eisberg

Please try:

请尝试:

// Set the youtube URL
$youtube_url = "www.youtube.com/watch?v=vpfzjcCzdtCk";

if (preg_match("/^((http\:\/\/){0,}(www\.){0,}(youtube\.com){1}|(youtu\.be){1}(\/watch\?v\=[^\s]){1})$/", $youtube_url) == 1)
{
    echo "Valid";
}
else
{
    echo "Invalid";
}

You had || which is ok without ^$ in any case.

你有 || 在任何情况下,没有 ^$ 都可以。

回答by Steven Moseley

This should do it:

这应该这样做:

$valid = preg_match("/^(https?\:\/\/)?(www\.)?(youtube\.com|youtu\.be)\/watch\?v\=\w+$/", $youtube_url);
if ($valid) {
    echo "Valid";
} else {
    echo "Invalid";
}

回答by Glenn Slayden

I defer to the other answers on this page for parsing the URL syntax, but for the YouTube IDvalues themselves, you can be a little bit more specific, as I describe in the following answer on StackExchange/WebApps:

我遵循此页面上的其他答案来解析 URL 语法,但对于YouTube ID值本身,您可以更具体一点,正如我在StackExchange/WebApps上的以下答案中所述:

Format for ID of YouTube video? -? ? https://webapps.stackexchange.com/a/101153/141734


Video Id

For the videoId, it is an 8-byte (64-bit) integer. Applying Base64-encoding to 8 bytes of data requires 11 characters. However, since each Base64 character conveys exactly 6 bits, this allocation could actually hold up to 11 × 6 = 66bits--a surplus of 2 bits over what our payload needs. The excess bits are set to zero, which has the effect of excluding certain characters from ever appearing in the last position of the encoded string. In particular, the videoIdwill always end with one of the following:

{ A, E, I, M, Q, U, Y, c, g, k, o, s, w, 0, 4, 8 }

Thus, a regular expression (RegEx) for the videoIdwould be as follows:

[-_A-Za-z0-9]{10}[AEIMQUYcgkosw048]

Channel or Playlist Id

The channelIdand playlistIdstrings are produced by Base64-encoding a 128-bit (16-byte) binary integer. Again here, calculation per Base64 correctly predicts the observed string length of 22-characters. In this case, the output is capable of encoding 22 × 6 = 132bits, a surplus of 4 bits; those zeros end up restricting most of the 64 alphabet symbols from appearing in the last position, and only 4 remain eligible. All channelIdstrings end in one of the following:

{ A, Q, g, w }

This gives us the regular expression for a channelId:

[-_A-Za-z0-9]{21}[AQgw]

YouTube 视频 ID 的格式?-?? https://webapps.stackexchange.com/a/101153/141734


视频 ID

对于videoId,它是一个 8 字节(64 位)整数。将 Base64 编码应用于 8 个字节的数据需要11 个字符。然而,由于每个 Base64 字符恰好传达 6 位,因此这种分配实际上可以容纳多达11 × 6 = 66位——比我们的有效负载需要多出 2 位。多余的位被设置为零,这具有排除某些字符出现在编码字符串的最后位置的效果。特别是,videoId将始终以以下之一结尾:

{ A, E, I, M, Q, U, Y, c, g, k, o, s, w, 0, 4, 8 }

因此,videoId的正则表达式 (RegEx)将如下所示:

[-_A-Za-z0-9]{10}[AEIMQUYcgkosw048]

频道或播放列表 ID

所述的channelIDplaylistId串被编码的Base64 128位(16字节)的二进制整数制造。再次在这里,每个 Base64 的计算正确地预测了观察到的22-characters字符串长度。在这种情况下,输出能够编码22 × 6 = 132位,剩余 4 位;这些零最终限制了 64 个字母符号中的大部分出现在最后一个位置,只有 4 个仍然符合条件。所有channelId字符串都以以下之一结尾:

{ A, Q, g, w }

这为我们提供了channelId的正则表达式:

[-_A-Za-z0-9]{21}[AQgw]