在 PHP 中检索 <title> 的最快方法

Question

提问by Ed Carrel

I'm doing a bookmarking system and looking for the fastest (easiest) way to retrieve a page's title with PHP.

我正在做一个书签系统，并正在寻找使用 PHP 检索页面标题的最快（最简单）方法。

It would be nice to have something like $title = page_title($url)

有类似的东西会很好 $title = page_title($url)

Answer 1

回答by Ed Carrel

<?php
    function page_title($url) {
        $fp = file_get_contents($url);
        if (!$fp) 
            return null;

        $res = preg_match("/<title>(.*)<\/title>/siU", $fp, $title_matches);
        if (!$res) 
            return null; 

        // Clean up title: remove EOL's and excessive whitespace.
        $title = preg_replace('/\s+/', ' ', $title_matches[1]);
        $title = trim($title);
        return $title;
    }
?>

Gave 'er a whirl on the following input:

对以下输入试一试：

print page_title("http://www.google.com/");

Outputted: Google

输出：谷歌

Hopefully general enough for your usage. If you need something more powerful, it might not hurt to invest a bit of time into researching HTML parsers.

希望足够通用以供您使用。如果您需要更强大的东西，花一点时间研究 HTML 解析器可能不会有什么坏处。

EDIT: Added a bit of error checking. Kind of rushed the first version out, sorry.

编辑：添加了一些错误检查。有点匆忙推出第一个版本，抱歉。

Answer 2

回答by Lukas Liesis

You can get it without reg expressions:

您可以在没有 reg 表达式的情况下获得它：

$title = '';
$dom = new DOMDocument();

if($dom->loadHTMLFile($urlpage)) {
    $list = $dom->getElementsByTagName("title");
    if ($list->length > 0) {
        $title = $list->item(0)->textContent;
    }
}

Answer 3

回答by Alexei Tenitski

or making this simple function slightly more bullet proof:

或者让这个简单的函数更加防弹：

function page_title($url) {

    $page = file_get_contents($url);

    if (!$page) return null;

    $matches = array();

    if (preg_match('/<title>(.*?)<\/title>/', $page, $matches)) {
        return $matches[1];
    } else {
        return null;
    }
}


echo page_title('http://google.com');

Answer 4

回答by alex

Regex?

正则表达式？

Use cURLto get the $htmlSource variable's contents.

使用cURL获取 $htmlSource 变量的内容。

preg_match('/<title>(.*)<\/title>/iU', $htmlSource, $titleMatches);

print_r($titleMatches);

see what you have in that array.

看看你在那个数组中有什么。

Most people say for HTML traversing though you should use a parser as regexs can be unreliable.

大多数人说 HTML 遍历虽然你应该使用解析器，因为正则表达式可能不可靠。

The other answers provide more detail :)

其他答案提供了更多细节:)

Answer 5

回答by wilks

I'm also doing a bookmarking system and found that since PHP 5 you can use stream_get_lineto load the remote page only until the closing title tag (instead of loading the whole file), then get rid of what's before the opening title tag with explode(instead of a regex).

我也在做一个书签系统，发现从 PHP 5 开始，你可以stream_get_line用来加载远程页面，直到关闭标题标签（而不是加载整个文件），然后去掉开头的标题标签之前的内容explode（而不是一的正则表达式）。

function page_title($url) {
  $title = false;
  if ($handle = fopen($url, "r"))  {
    $string = stream_get_line($handle, 0, "</title>");
    fclose($handle);
    $string = (explode("<title", $string))[1];
    if (!empty($string)) {
      $title = trim((explode(">", $string))[1]);
    }
  }
  return $title;
}

Last explodethanks to PlugTrade's answerwho reminded me that title tags can have attributes.

最后explode感谢 PlugTrade 的回答，他提醒我标题标签可以有属性。

Answer 6

回答by PlugTrade.com

A function to handle title tags that have attributes added to them

处理添加了属性的标题标签的函数

function get_title($html)
{
    preg_match("/<title(.+)<\/title>/siU", $html, $matches);
    if( !empty( $matches[1] ) ) 
    {
        $title = $matches[1];

        if( strstr($title, '>') )
        {
            $title = explode( '>', $title, 2 );
            $title = $title[1];

            return trim($title);
        }   
    }
}

$html = '<tiTle class="aunt">jemima</tiTLE>';
$title = get_title($html);
echo $title;

Answer 7

回答by null

I like using SimpleXml with regex's, this is from a solution I use to grab multiple link headers from a page in an OpenID library I've created. I've adapted it to work with the title (even though there is usuallyonly one).

我喜欢将 SimpleXml 与正则表达式一起使用，这是来自我用来从我创建的 OpenID 库中的页面中获取多个链接标题的解决方案。我已经对其进行了修改以与标题一起使用（即使通常只有一个）。

function getTitle($sFile)
{
    $sData = file_get_contents($sFile);

    if(preg_match('/<head.[^>]*>.*<\/head>/is', $sData, $aHead))
    {   
        $sDataHtml = preg_replace('/<(.[^>]*)>/i', strtolower('<>'), $aHead[0]);
        $xTitle = simplexml_import_dom(DomDocument::LoadHtml($sDataHtml));

        return (string)$xTitle->head->title;
    }
    return null;
}

echo getTitle('http://stackoverflow.com/questions/399332/fastest-way-to-retrieve-a-title-in-php');

Ironically this page has a "title tag" in the title tag which is what sometime causes problems with the pure regex solutions.

具有讽刺意味的是，该页面的标题标签中有一个“标题标签”，这有时会导致纯正则表达式解决方案出现问题。

This solution is not perfect as it lowercase's the tags which could cause a problem for the nested tag if formatting/case was important (such as XML), but there are ways that are a bit more involved around that problem.

此解决方案并不完美，因为如果格式/大小写很重要（例如 XML），它小写的标签可能会导致嵌套标签出现问题，但有一些方法可以解决该问题。

在 PHP 中检索 <title> 的最快方法

提问by Ed Carrel

回答by Ed Carrel

回答by Lukas Liesis

回答by Alexei Tenitski

回答by alex

回答by wilks

回答by PlugTrade.com

回答by null

相关推荐

最近更新

标签

在 PHP 中检索 <title> 的最快方法

提问by Ed Carrel

回答by Ed Carrel

回答by Lukas Liesis

回答by Alexei Tenitski

回答by alex

回答by wilks

回答by PlugTrade.com

回答by null

相关推荐

PHP 默认函数参数值，如何为“非最后”参数“传递默认值”？

php 如何在此日期时间字符串的时间上添加一个小时？

PHP 函数来获取 Facebook 状态？

php Cron 作业和文件夹权限 - 权限被拒绝

相关推荐

最近更新

标签