在 PHP 中检索 <title> 的最快方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/399332/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fastest way to retrieve a <title> in PHP
提问by Ed Carrel
I'm doing a bookmarking system and looking for the fastest (easiest) way to retrieve a page's title with PHP.
我正在做一个书签系统,并正在寻找使用 PHP 检索页面标题的最快(最简单)方法。
It would be nice to have something like $title = page_title($url)
有类似的东西会很好 $title = page_title($url)
回答by Ed Carrel
<?php
function page_title($url) {
$fp = file_get_contents($url);
if (!$fp)
return null;
$res = preg_match("/<title>(.*)<\/title>/siU", $fp, $title_matches);
if (!$res)
return null;
// Clean up title: remove EOL's and excessive whitespace.
$title = preg_replace('/\s+/', ' ', $title_matches[1]);
$title = trim($title);
return $title;
}
?>
Gave 'er a whirl on the following input:
对以下输入试一试:
print page_title("http://www.google.com/");
Outputted: Google
输出:谷歌
Hopefully general enough for your usage. If you need something more powerful, it might not hurt to invest a bit of time into researching HTML parsers.
希望足够通用以供您使用。如果您需要更强大的东西,花一点时间研究 HTML 解析器可能不会有什么坏处。
EDIT: Added a bit of error checking. Kind of rushed the first version out, sorry.
编辑:添加了一些错误检查。有点匆忙推出第一个版本,抱歉。
回答by Lukas Liesis
You can get it without reg expressions:
您可以在没有 reg 表达式的情况下获得它:
$title = '';
$dom = new DOMDocument();
if($dom->loadHTMLFile($urlpage)) {
$list = $dom->getElementsByTagName("title");
if ($list->length > 0) {
$title = $list->item(0)->textContent;
}
}
回答by Alexei Tenitski
or making this simple function slightly more bullet proof:
或者让这个简单的函数更加防弹:
function page_title($url) {
$page = file_get_contents($url);
if (!$page) return null;
$matches = array();
if (preg_match('/<title>(.*?)<\/title>/', $page, $matches)) {
return $matches[1];
} else {
return null;
}
}
echo page_title('http://google.com');
回答by alex
Regex?
正则表达式?
Use cURLto get the $htmlSource variable's contents.
使用cURL获取 $htmlSource 变量的内容。
preg_match('/<title>(.*)<\/title>/iU', $htmlSource, $titleMatches);
print_r($titleMatches);
see what you have in that array.
看看你在那个数组中有什么。
Most people say for HTML traversing though you should use a parser as regexs can be unreliable.
大多数人说 HTML 遍历虽然你应该使用解析器,因为正则表达式可能不可靠。
The other answers provide more detail :)
其他答案提供了更多细节:)
回答by wilks
I'm also doing a bookmarking system and found that since PHP 5 you can use stream_get_lineto load the remote page only until the closing title tag (instead of loading the whole file), then get rid of what's before the opening title tag with explode(instead of a regex).
我也在做一个书签系统,发现从 PHP 5 开始,你可以stream_get_line用来加载远程页面,直到关闭标题标签(而不是加载整个文件),然后去掉开头的标题标签之前的内容explode(而不是一的正则表达式)。
function page_title($url) {
$title = false;
if ($handle = fopen($url, "r")) {
$string = stream_get_line($handle, 0, "</title>");
fclose($handle);
$string = (explode("<title", $string))[1];
if (!empty($string)) {
$title = trim((explode(">", $string))[1]);
}
}
return $title;
}
Last explodethanks to PlugTrade's answerwho reminded me that title tags can have attributes.
最后explode感谢 PlugTrade 的回答,他提醒我标题标签可以有属性。
回答by PlugTrade.com
A function to handle title tags that have attributes added to them
处理添加了属性的标题标签的函数
function get_title($html)
{
preg_match("/<title(.+)<\/title>/siU", $html, $matches);
if( !empty( $matches[1] ) )
{
$title = $matches[1];
if( strstr($title, '>') )
{
$title = explode( '>', $title, 2 );
$title = $title[1];
return trim($title);
}
}
}
$html = '<tiTle class="aunt">jemima</tiTLE>';
$title = get_title($html);
echo $title;
回答by null
I like using SimpleXml with regex's, this is from a solution I use to grab multiple link headers from a page in an OpenID library I've created. I've adapted it to work with the title (even though there is usuallyonly one).
我喜欢将 SimpleXml 与正则表达式一起使用,这是来自我用来从我创建的 OpenID 库中的页面中获取多个链接标题的解决方案。我已经对其进行了修改以与标题一起使用(即使通常只有一个)。
function getTitle($sFile)
{
$sData = file_get_contents($sFile);
if(preg_match('/<head.[^>]*>.*<\/head>/is', $sData, $aHead))
{
$sDataHtml = preg_replace('/<(.[^>]*)>/i', strtolower('<>'), $aHead[0]);
$xTitle = simplexml_import_dom(DomDocument::LoadHtml($sDataHtml));
return (string)$xTitle->head->title;
}
return null;
}
echo getTitle('http://stackoverflow.com/questions/399332/fastest-way-to-retrieve-a-title-in-php');
Ironically this page has a "title tag" in the title tag which is what sometime causes problems with the pure regex solutions.
具有讽刺意味的是,该页面的标题标签中有一个“标题标签”,这有时会导致纯正则表达式解决方案出现问题。
This solution is not perfect as it lowercase's the tags which could cause a problem for the nested tag if formatting/case was important (such as XML), but there are ways that are a bit more involved around that problem.
此解决方案并不完美,因为如果格式/大小写很重要(例如 XML),它小写的标签可能会导致嵌套标签出现问题,但有一些方法可以解决该问题。

