php 如何使用正则表达式在字符串中查找所有 YouTube 视频 ID?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5830387/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I find all YouTube video ids in a string using a regex?
提问by n00b
I have a textfield where users can write anything.
我有一个文本字段,用户可以在其中编写任何内容。
For example:
例如:
Lorem Ipsum is simply dummy text. http://www.youtube.com/watch?v=DUQi_R4SgWoof the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. http://www.youtube.com/watch?v=A_6gNZCkajU&feature=relmfuIt was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
Lorem Ipsum 只是虚拟文本。 http://www.youtube.com/watch?v=DUQi_R4SgWo印刷排版行业。自 1500 年代以来,Lorem Ipsum 一直是行业标准的虚拟文本,当时一位不知名的印刷商使用了一个类型的厨房,并争先恐后地制作了一本类型样本书。它不仅存活了五个世纪,而且还经历了电子排版的飞跃,基本保持不变。 http://www.youtube.com/watch?v=A_6gNZCkajU&feature=relmfu它在 1960 年代随着包含 Lorem Ipsum 段落的 Letraset 表的发布而流行,最近随着包含 Lorem Ipsum 版本的 Aldus PageMaker 等桌面出版软件的发布而流行。
Now I would like to parse it and find all YouTube video URLs and their ids.
现在我想解析它并找到所有 YouTube 视频 URL 及其 ID。
Any idea how that works?
知道它是如何工作的吗?
回答by ridgerunner
A YouTube video URL may be encountered in a variety of formats:
可能会遇到各种格式的 YouTube 视频 URL:
- latest short format:
http://youtu.be/NLqAF9hrVbY
- iframe:
http://www.youtube.com/embed/NLqAF9hrVbY
- iframe (secure):
https://www.youtube.com/embed/NLqAF9hrVbY
- object param:
http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
- object embed:
http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
- watch:
http://www.youtube.com/watch?v=NLqAF9hrVbY
- users:
http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo
- ytscreeningroom:
http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I
- any/thing/goes!:
http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/2/PPS-8DMrAn4
- any/subdomain/too:
http://gdata.youtube.com/feeds/api/videos/NLqAF9hrVbY
- more params:
http://www.youtube.com/watch?v=spDj54kf-vY&feature=g-vrec
- query may have dot:
http://www.youtube.com/watch?v=spDj54kf-vY&feature=youtu.be
- nocookie domain:
http://www.youtube-nocookie.com
- 最新的短格式:
http://youtu.be/NLqAF9hrVbY
- 框架:
http://www.youtube.com/embed/NLqAF9hrVbY
- iframe(安全):
https://www.youtube.com/embed/NLqAF9hrVbY
- 对象参数:
http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
- 对象嵌入:
http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
- 手表:
http://www.youtube.com/watch?v=NLqAF9hrVbY
- 用户:
http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo
- ytscreeningroom:
http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I
- 什么都可以!:
http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/2/PPS-8DMrAn4
- 任何/子域/太:
http://gdata.youtube.com/feeds/api/videos/NLqAF9hrVbY
- 更多参数:
http://www.youtube.com/watch?v=spDj54kf-vY&feature=g-vrec
- 查询可能有点:
http://www.youtube.com/watch?v=spDj54kf-vY&feature=youtu.be
- nocookie 域:
http://www.youtube-nocookie.com
Here is a PHP function with a commented regex that matches each of these URL forms and converts them to links (if they are not links already):
这是一个带有注释的正则表达式的 PHP 函数,它匹配这些 URL 形式中的每一个并将它们转换为链接(如果它们还不是链接):
// Linkify youtube URLs which are not already links.
function linkifyYouTubeURLs($text) {
$text = preg_replace('~(?#!js YouTubeId Rev:20160125_1800)
# Match non-linked youtube URL in the wild. (Rev:20130823)
https?:// # Required scheme. Either http or https.
(?:[0-9A-Z-]+\.)? # Optional subdomain.
(?: # Group host alternatives.
youtu\.be/ # Either youtu.be,
| youtube # or youtube.com or
(?:-nocookie)? # youtube-nocookie.com
\.com # followed by
\S*? # Allow anything up to VIDEO_ID,
[^\w\s-] # but char before ID is non-ID char.
) # End host alternatives.
([\w-]{11}) # : VIDEO_ID is exactly 11 chars.
(?=[^\w-]|$) # Assert next char is non-ID or EOS.
(?! # Assert URL is not pre-linked.
[?=&+%\w.-]* # Allow URL (query) remainder.
(?: # Group pre-linked alternatives.
[\'"][^<>]*> # Either inside a start tag,
| </a> # or inside <a> element text contents.
) # End recognized pre-linked alts.
) # End negative lookahead assertion.
[?=&+%\w.-]* # Consume any URL (query) remainder.
~ix', '<a href="http://www.youtube.com/watch?v=">YouTube link: </a>',
$text);
return $text;
}
; // End $YouTubeId.
; // 结束 $YouTubeId。
And here is a JavaScript version with the exact same regex (with comments removed):
这是一个具有完全相同正则表达式的 JavaScript 版本(删除了注释):
// Linkify youtube URLs which are not already links.
function linkifyYouTubeURLs(text) {
var re = /https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\S*?[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:['"][^<>]*>|<\/a>))[?=&+%\w.-]*/ig;
return text.replace(re,
'<a href="http://www.youtube.com/watch?v=">YouTube link: </a>');
}
Notes:
笔记:
- The VIDEO_ID portion of the URL is captured in the one and only capture group:
$1
. - If you know that your text does not contain any pre-linked URLs, you can safely remove the negative lookahead assertion which tests for this condition (The assertion beginning with the comment: "Assert URL is not pre-linked.") This will speed up the regex somewhat.
- The replace string can be modified to suit. The one provided above simply creates a link to the generic
"http://www.youtube.com/watch?v=VIDEO_ID"
style URL and sets the link text to:"YouTube link: VIDEO_ID"
.
- URL 的 VIDEO_ID 部分在一个且唯一的捕获组中捕获:
$1
。 - 如果您知道您的文本不包含任何预链接的 URL,您可以安全地删除测试此条件的否定前瞻断言(以评论开头的断言:“Assert URL is not pre-linked.”)这将加快稍微提高正则表达式。
- 可以修改替换字符串以适应。上面提供的只是创建一个指向通用
"http://www.youtube.com/watch?v=VIDEO_ID"
样式 URL 的链接,并将链接文本设置为:"YouTube link: VIDEO_ID"
。
Edit 2011-07-05:Added -
hyphen to ID char class
编辑 2011-07-05:-
在 ID 字符类中添加了连字符
Edit 2011-07-17:Fixed regex to consume any remaining part (e.g. query) of URL following YouTube ID. Added 'i'
ignore-casemodifier. Renamed function to camelCase. Improved pre-linked lookahead test.
编辑 2011-07-17:修复了正则表达式以消耗YouTube ID 之后 URL 的任何剩余部分(例如查询)。添加了'i'
忽略大小写修饰符。将函数重命名为驼峰式命名法。改进的预链接前瞻测试。
Edit 2011-07-27:Added new "user" and "ytscreeningroom" formats of YouTube URLs.
编辑 2011-07-27:添加了 YouTube URL 的新“用户”和“ytscreeningroom”格式。
Edit 2011-08-02:Simplified/generalized to handle new "any/thing/goes" YouTube URLs.
编辑 2011-08-02:简化/通用以处理新的“任何/事情/去”YouTube URL。
Edit 2011-08-25:Several modifications:
编辑 2011-08-25:几处修改:
- Added a Javascript version of:
linkifyYouTubeURLs()
function. - Previous version had the scheme (HTTP protocol) part optional and thus would match invalid URLs. Made the scheme part required.
- Previous version used the
\b
word boundary anchor around the VIDEO_ID. However, this will not work if the VIDEO_ID begins or ends with a-
dash. Fixed so that it handles this condition. - Changed the VIDEO_ID expression so that it must be exactly 11 characters long.
- The previous version failed to exclude pre-linked URLs if they had a query string following the VIDEO_ID. Improved the negative lookahead assertion to fix this.
- Added
+
and%
to character class matching query string. - Changed PHP version regex delimiter from:
%
to a:~
. - Added a "Notes" section with some handy notes.
- 添加了以下 Javascript 版本:
linkifyYouTubeURLs()
函数。 - 以前的版本有方案(HTTP 协议)部分可选,因此会匹配无效的 URL。使方案部分成为必需。
- 以前的版本
\b
在 VIDEO_ID 周围使用了词边界锚点。但是,如果 VIDEO_ID 以-
破折号开头或结尾,这将不起作用。固定以便它处理这种情况。 - 更改了 VIDEO_ID 表达式,使其长度必须正好为 11 个字符。
- 如果 VIDEO_ID 后面有查询字符串,以前的版本无法排除预链接的 URL。改进了否定前瞻断言来解决这个问题。
- 添加
+
和%
到字符类匹配查询字符串。 - 将 PHP 版本正则表达式分隔符从: 更改
%
为 a:~
。 - 添加了“注释”部分,其中包含一些方便的注释。
Edit 2011-10-12:YouTube URL host part may now have any subdomain (not just www.
).
编辑 2011-10-12:YouTube URL 主机部分现在可以有任何子域(不仅仅是www.
)。
Edit 2012-05-01:The consume URL section may now allow for '-'.
编辑 2012-05-01:消费 URL 部分现在可能允许使用“-”。
Edit 2013-08-23:Added additional format provided by @Mei. (The query part may have a .
dot.
编辑 2013-08-23:添加了@Mei 提供的附加格式。(查询部分可能有一个.
点。
Edit 2013-11-30:Added additional format provided by @CRONUS: youtube-nocookie.com
.
编辑 2013-11-30:添加了@CRONUS 提供的附加格式:youtube-nocookie.com
。
Edit 2016-01-25:Fixed regex to handle error case provided by CRONUS.
编辑 2016-01-25:修复了正则表达式以处理 CRONUS 提供的错误情况。
回答by Christof
Here's a method I once wrote for a project that extracts YouTube and Vimeo video keys:
这是我曾经为提取 YouTube 和 Vimeo 视频密钥的项目编写的方法:
/**
* strip important information out of any video link
*
* @param string link to a video on the hosters page
* @return mixed FALSE on failure, array on success
*/
function getHostInfo ($vid_link)
{
// YouTube get video id
if (strpos($vid_link, 'youtu'))
{
// Regular links
if (preg_match('/(?<=v\=)([\w\d-_]+)/', $vid_link, $matches))
return array('host_name' => 'youtube', 'original_key' => $matches[0]);
// Ajax hash tag links
else if (preg_match('§([\d\w-_]+)$§i', $vid_link, $matches))
return array('host_name' => 'youtube', 'original_key' => $matches[0]);
else
return FALSE;
}
// Vimeo get video id
elseif (strpos($vid_link, 'vimeo'))
{
if (preg_match('§(?<=/)([\d]+)§', $vid_link, $matches))
return array('host_name' => 'vimeo', 'original_key' => $matches[0]);
else
return FALSE;
}
else
return FALSE;
}
- Find a regex that will extract all links from a text. Google will help you there.
- Loop all the links and call getHostInfo() for each
- 找到一个可以从文本中提取所有链接的正则表达式。谷歌会在那里帮助你。
- 循环所有链接并为每个链接调用 getHostInfo()
回答by ezwrighter
While ridgerunner's answer is the basis for my answer, his does NOT solve for all urls and I don't believe it is capable of it, due to multiple possible matches of VIDEO_ID
in a YouTube URL. My regex includes his aggressive approach as a last resort, but attempts all common matchings first, vastly reducing the possibility of a wrong match later in the URL.
虽然 ridgerunner 的回答是我回答的基础,但他的回答并不能解决所有的 url,而且我不相信它有能力解决这个问题,因为VIDEO_ID
YouTube URL 中有多个可能的匹配项。我的正则表达式包括他的激进方法作为最后的手段,但首先尝试所有常见的匹配,大大降低了后来在 URL 中出现错误匹配的可能性。
This regex:
这个正则表达式:
/https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube\.com(?:\/embed\/|\/v\/|\/watch\?v=|\/ytscreeningroom\?v=|\/feeds\/api\/videos\/|\/user\S*[^\w\-\s]|\S*[^\w\-\s]))([\w\-]{11})[?=&+%\w-]*/ig;
Handles all of the cases originally referenced in ridgerunners examples, plus any url that might happen to have an 11 character sequence later in the url. ie:
处理最初在 ridgerunners 示例中引用的所有情况,以及在 url 后面可能碰巧有 11 个字符序列的任何 url。IE:
http://www.youtube.com/watch?v=GUEZCxBcM78&feature=pyv&feature=pyv&ad=10059374899&kw=%2Bwingsuit
http://www.youtube.com/watch?v=GUEZCxBcM78&feature=pyv&feature=pyv&ad=10059374899&kw=%2Bwingsuit
Here is a working sample that tests all of the sample YouTube urls:
这是一个测试所有示例 YouTube 网址的工作示例:
回答by Noor Khan
Use:
用:
<?php
// The YouTube URL string
$youtube_url='http://www.youtube.com/watch?v=8VtUYvwktFQ';
// Use regex to get the video ID
$regex='#(?<=v=)[a-zA-Z0-9-]+(?=&)|(?<=[0-9]/)[^&\n]+|(?<=v=)[^&\n]+#';
preg_match($regex, $youtube_url, $id);
// Plug that into our HTML
?>
回答by n00b
Okay, I made a function of my own. But I believe it's pretty inefficient. Any improvements are welcome:
好的,我自己做了一个函数。但我认为这是非常低效的。欢迎任何改进:
function get_youtube_videos($string) {
$ids = array();
// Find all URLs
preg_match_all('/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/', $string, $links);
foreach ($links[0] as $link) {
if (preg_match('~youtube\.com~', $link)) {
if (preg_match('/[^=]+=([^?]+)/', $link, $id)) {
$ids[] = $id[1];
}
}
}
return $ids;
}
回答by stema
Try
尝试
[^\s]*youtube\.com[^\s]*?v=([-\w]+)[^\s]*
You will find the video IDs' in the first capturing group. What I don't know is what is a valid Video ID? At the moment I check for v=
and capture all -A-Za-z0-9_
.
您将在第一个捕获组中找到视频 ID。我不知道什么是有效的视频 ID?目前我检查v=
并捕获所有-A-Za-z0-9_
.
I checked it online here on rubularwith your sample string.
我使用您的示例字符串在 rubular上在线检查了它。
回答by B L Praveen
I tried a simple expression to get only the videoid:
我尝试了一个简单的表达式来只获取 videoid:
[?&]v=([^&#]*)
Check it working online here at phpliveregex.
在 phpliveregex 上在线检查它是否有效。
回答by Lee Woodman
The original poster asked "I would like to parse it and find all YouTube video URLs and their ids." I switched the most popular answer above to a preg_match and returned the video id and URL.
原发帖者问“我想解析它并找到所有 YouTube 视频 URL 及其 ID。” 我将上面最受欢迎的答案切换为 preg_match 并返回了视频 ID 和 URL。
Get YouTube URL and ID from post:
从帖子中获取 YouTube 网址和 ID:
$match[0] = Full URL
$match[1] = video ID
function get_youtube_id($input) {
$input = preg_match('~https?://(?:[0-9A-Z-]+\.)?(?:youtu\.be/|youtube(?:-nocookie)?\.com\S*[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:[\'"][^<>]*>|</a>))[?=&+%\w.-]*~ix',
$input, $match);
return $match;
}
回答by Sravya Singh
String urlid="" ;
String url="http://www.youtube.com/watch?v=0zM4nApSvMg#t=0m10s";
Pattern pattern =Pattern.compile("(?:http|https|)(?::\/\/|)(?:www.|)(?:youtu\.be\/|youtube\.com(?:\/embed\/|\/v\/|\/watch\?v=|\/ytscreeningroom\?v=|\/feeds\/api\/videos\/|\/user\\S*[^\w\-\s]|\S*[^\w\-\s]))([\w\-\_]{11})[a-z0-9;:@#?&%=+\/\$_.-]*");
Matcher result = pattern.matcher(url);
if (result.find())
{
urlid=result.group(1);
}
This code in java works absolutely fine for all youtube urls at present.
Java 中的这段代码目前适用于所有 youtube 网址。
回答by Mukesh Kumar Bijarniya
Find a YouTube link easily from a string:
从字符串中轻松查找 YouTube 链接:
function my_url_search($se_action_data)
{
$regex = '/https?\:\/\/[^\" ]+/i';
preg_match_all($regex, $se_action_data, $matches);
$get_url=array_reverse($matches[0]);
return array_unique($get_url);
}
echo my_url_search($se_action_data)