php 使用正则表达式验证 Youtube URL

Question

提问by Luke

I'm trying to validate YouTube URLs for my application.

我正在尝试验证我的应用程序的 YouTube 网址。

So far I have the following:

到目前为止，我有以下几点：

// Set the youtube URL
$youtube_url = "www.youtube.com/watch?v=vpfzjcCzdtCk";

if (preg_match("/((http\:\/\/){0,}(www\.){0,}(youtube\.com){1} || (youtu\.be){1}(\/watch\?v\=[^\s]){1})/", $youtube_url) == 1)
{
    echo "Valid";
else
{
    echo "Invalid";
}

I wish to validate the following variations of Youtube Urls:

我希望验证 Youtube Urls 的以下变体：

With and without http://
With and without www.
With the URLs youtube.com and youtu.be
Must have /watch?v=
Must have the unique video string (In the example above "vpfzjcCzdtCk")

有和没有 http://
有和没有 www。
使用 URL youtube.com 和 youtu.be
必须有 /watch?v=
必须具有唯一的视频字符串（在上面的示例中为“vpfzjcCzdtCk”）

However, I don't think I've got my logic right, because for some reason it returns truefor: www.youtube.co/watch?v=vpfzjcCzdtCk(Notice I've written it incorrectly with .coand not .com)

但是，我认为我的逻辑不正确，因为出于某种原因，它返回true：（www.youtube.co/watch?v=vpfzjcCzdtCk注意我用.co和 not错误地编写了它.com）

Answer 1

回答by Linus Kleen

There are a lot of redundancies in this regular expression of yours (and also, the leaning toothpick syndrome). This, though, should produce results:

你的这个正则表达式有很多冗余（还有倾斜牙签综合症）。不过，这应该会产生结果：

$rx = '~
  ^(?:https?://)?                           # Optional protocol
   (?:www[.])?                              # Optional sub-domain
   (?:youtube[.]com/watch[?]v=|youtu[.]be/) # Mandatory domain name (w/ query string in .com)
   ([^&]{11})                               # Video id of 11 characters as capture group 1
    ~x';

$has_match = preg_match($rx, $url, $matches);

// if matching succeeded, $matches[1] would contain the video ID

Some notes:

一些注意事项：

use the tilde character ~as delimiter, to avoid LTS
use [.]instead of \.to improve visual legibility and avoid LTS. ("Special" characters - such as the dot .- have no effect in character classes (within square brackets))
to make regular expressions more "readable" you can use the xmodifier (which has further implications; see the docs on Pattern modifiers), which also allows for comments in regular expressions
capturing can be suppressed using non-capturing groups: (?: <pattern> ). This makes the expression more efficient.

使用波浪号~作为分隔符，以避免 LTS
使用[.]而不是\.提高视觉易读性并避免 LTS。（“特殊”字符 - 例如点.- 在字符类中没有影响（在方括号内））
为了使正则表达式更具“可读性”，您可以使用x修饰符（它具有进一步的含义；请参阅有关模式修饰符的文档），它还允许在正则表达式中添加注释
可以使用非捕获组抑制捕获：(?: <pattern> ). 这使得表达式更有效。

Optionally, to extract values from a (more or less complete) URL, you might want to make use of parse_url():

或者，要从（或多或少完整的）URL 中提取值，您可能需要使用parse_url()：

$url = 'http://youtube.com/watch?v=VIDEOID';
$parts = parse_url($url);
print_r($parts);

Output:

输出：

Array
(
    [scheme] => http
    [host] => youtube.com
    [path] => /watch
    [query] => v=VIDEOID
)

Validating the domain name and extracting the video ID is left as an exercise to the reader.

验证域名和提取视频 ID 作为练习留给读者。

I gave in to the comment war below; thanks to Toni Oriol, the regular expression now works on short (youtu.be) URLs as well.

我屈服于下面的评论战；感谢 Toni Oriol，正则表达式现在也适用于短 (youtu.be) URL。

Answer 2

回答by Jason McCreary

An alternative to Regular Expressions would be parse_url().

正则表达式的替代方法是parse_url().

 $parts = parse_url($url);
 if ($parts['host'] == 'youtube.com' && ...) {
   // your code
 }

While it is more code, it is more readable and therefore more maintainable.

虽然它是更多的代码，但它更具可读性，因此更易于维护。

Answer 3

回答by eisberg

Please try:

请尝试：

// Set the youtube URL
$youtube_url = "www.youtube.com/watch?v=vpfzjcCzdtCk";

if (preg_match("/^((http\:\/\/){0,}(www\.){0,}(youtube\.com){1}|(youtu\.be){1}(\/watch\?v\=[^\s]){1})$/", $youtube_url) == 1)
{
    echo "Valid";
}
else
{
    echo "Invalid";
}

You had || which is ok without ^$ in any case.

你有 || 在任何情况下，没有 ^$ 都可以。

Answer 4

回答by Steven Moseley

This should do it:

这应该这样做：

$valid = preg_match("/^(https?\:\/\/)?(www\.)?(youtube\.com|youtu\.be)\/watch\?v\=\w+$/", $youtube_url);
if ($valid) {
    echo "Valid";
} else {
    echo "Invalid";
}

Answer 5

回答by Glenn Slayden

I defer to the other answers on this page for parsing the URL syntax, but for the YouTube IDvalues themselves, you can be a little bit more specific, as I describe in the following answer on StackExchange/WebApps:

我遵循此页面上的其他答案来解析 URL 语法，但对于YouTube ID值本身，您可以更具体一点，正如我在StackExchange/WebApps上的以下答案中所述：

Format for ID of YouTube video? -? ? https://webapps.stackexchange.com/a/101153/141734
Video Id
For the videoId, it is an 8-byte (64-bit) integer. Applying Base64-encoding to 8 bytes of data requires 11 characters. However, since each Base64 character conveys exactly 6 bits, this allocation could actually hold up to 11 × 6 = 66bits--a surplus of 2 bits over what our payload needs. The excess bits are set to zero, which has the effect of excluding certain characters from ever appearing in the last position of the encoded string. In particular, the videoIdwill always end with one of the following:
{ A, E, I, M, Q, U, Y, c, g, k, o, s, w, 0, 4, 8 }
Thus, a regular expression (RegEx) for the videoIdwould be as follows:
[-_A-Za-z0-9]{10}[AEIMQUYcgkosw048]
Channel or Playlist Id
The channelIdand playlistIdstrings are produced by Base64-encoding a 128-bit (16-byte) binary integer. Again here, calculation per Base64 correctly predicts the observed string length of 22-characters. In this case, the output is capable of encoding 22 × 6 = 132bits, a surplus of 4 bits; those zeros end up restricting most of the 64 alphabet symbols from appearing in the last position, and only 4 remain eligible. All channelIdstrings end in one of the following:
{ A, Q, g, w }
This gives us the regular expression for a channelId:
[-_A-Za-z0-9]{21}[AQgw]

YouTube 视频 ID 的格式？-？? https://webapps.stackexchange.com/a/101153/141734
视频 ID
对于videoId，它是一个 8 字节（64 位）整数。将 Base64 编码应用于 8 个字节的数据需要11 个字符。然而，由于每个 Base64 字符恰好传达 6 位，因此这种分配实际上可以容纳多达11 × 6 = 66位——比我们的有效负载需要多出 2 位。多余的位被设置为零，这具有排除某些字符出现在编码字符串的最后位置的效果。特别是，videoId将始终以以下之一结尾：
{ A, E, I, M, Q, U, Y, c, g, k, o, s, w, 0, 4, 8 }
因此，videoId的正则表达式 (RegEx)将如下所示：
[-_A-Za-z0-9]{10}[AEIMQUYcgkosw048]
频道或播放列表 ID
所述的channelID和playlistId串被编码的Base64 128位（16字节）的二进制整数制造。再次在这里，每个 Base64 的计算正确地预测了观察到的22-characters字符串长度。在这种情况下，输出能够编码22 × 6 = 132位，剩余 4 位；这些零最终限制了 64 个字母符号中的大部分出现在最后一个位置，只有 4 个仍然符合条件。所有channelId字符串都以以下之一结尾：
{ A, Q, g, w }
这为我们提供了channelId的正则表达式：
[-_A-Za-z0-9]{21}[AQgw]

php 使用正则表达式验证 Youtube URL

提问by Luke

回答by Linus Kleen

回答by Jason McCreary

回答by eisberg

回答by Steven Moseley

回答by Glenn Slayden

相关推荐

最近更新

标签

php 使用正则表达式验证 Youtube URL

提问by Luke

回答by Linus Kleen

回答by Jason McCreary

回答by eisberg

回答by Steven Moseley

回答by Glenn Slayden

相关推荐

php 如何在php中安装posix

如何在 PHP 中为 MongoDB 返回 ISO 日期格式？

PHP 脚本在 HTML 文件中不起作用

php 如何在 Codeigniter 中加载图像？

相关推荐

最近更新

标签