php 如何通过php获取网页的开放图协议?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7454644/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get Open Graph Protocol of a webpage by php?
提问by Googlebot
PHP has a simple command to get meta tags of a webpage (get_meta_tags), but this only works for meta tags with name attributes. However, Open Graph Protocol is becoming more and more popular these days. What is the easiest way to get the values of opg from a webpage. For example:
PHP 有一个简单的命令来获取网页的元标记 (get_meta_tags),但这仅适用于具有名称属性的元标记。然而,如今开放图谱协议变得越来越流行。从网页获取 opg 值的最简单方法是什么。例如:
<meta property="og:url" content="">
<meta property="og:title" content="">
<meta property="og:description" content="">
<meta property="og:type" content="">
The basic way I see is to get the page via cURL and parse it with regex. Any idea?
我看到的基本方法是通过 cURL 获取页面并使用正则表达式解析它。任何的想法?
回答by Guilherme Viebig
Really simple and well done:
真的很简单,做得很好:
Using https://github.com/scottmac/opengraph
使用https://github.com/scottmac/opengraph
$graph = OpenGraph::fetch('http://www.avessotv.com.br/bastidores-pantene-institute-experience-pg.html');
print_r($graph);
Will return
将返回
OpenGraph Object
OpenGraph 对象
(
[_values:OpenGraph:private] => Array
(
[type] => article
[video] => http://www.avessotv.com.br/player/flowplayer/flowplayer-3.2.7.swf?config=%7B%27clip%27%3A%7B%27url%27%3A%27http%3A%2F%2Fwww.avessotv.com.br%2Fmedia%2Fprogramas%2Fpantene.flv%27%7D%7D
[image] => /wp-content/thumbnails/9025.jpg
[site_name] => Programa Avesso - Bastidores
[title] => Bastidores ?¢???Pantene Institute Experience?¢?? P&G
[url] => http://www.avessotv.com.br/bastidores-pantene-institute-experience-pg.html
[description] => Confira os bastidores do Pantene Institute Experience, da Procter & Gamble. www.pantene.com.br Mais imagens:
)
[_position:OpenGraph:private] => 0
)
回答by Tom
When parsing data from HTML, you really shouldn't use regex. Take a look at the DOMXPath Query function.
从 HTML 解析数据时,您真的不应该使用正则表达式。看一看DOMXPath 查询函数。
Now, the actual code could be :
现在,实际的代码可能是:
[EDIT]A better query for XPath was given by Stefan Gehrig, so the code can be shortened to :
[编辑]Stefan Gehrig 提供了一个更好的 XPath 查询,因此代码可以缩短为:
libxml_use_internal_errors(true); // Yeah if you are so worried about using @ with warnings
$doc = new DomDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$query = '//*/meta[starts-with(@property, \'og:\')]';
$metas = $xpath->query($query);
$rmetas = array();
foreach ($metas as $meta) {
$property = $meta->getAttribute('property');
$content = $meta->getAttribute('content');
$rmetas[$property] = $content;
}
var_dump($rmetas);
Instead of :
代替 :
$doc = new DomDocument();
@$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$query = '//*/meta';
$metas = $xpath->query($query);
$rmetas = array();
foreach ($metas as $meta) {
$property = $meta->getAttribute('property');
$content = $meta->getAttribute('content');
if(!empty($property) && preg_match('#^og:#', $property)) {
$rmetas[$property] = $content;
}
}
var_dump($rmetas);
回答by zerkms
How about:
怎么样:
preg_match_all('~<\s*meta\s+property="(og:[^"]+)"\s+content="([^"]*)~i', $str, $matches);
So, yes, grab the page with any way you can and parse with regex
所以,是的,以任何方式抓取页面并使用正则表达式解析
回答by Bhaskar Bhatt
As per this method you will get key pair array of fabcebook open graph tags.
按照这种方法,您将获得 fabcebook 打开图形标签的密钥对数组。
$url="http://fbcpictures.in";
$site_html= file_get_contents($url);
$matches=null;
preg_match_all('~<\s*meta\s+property="(og:[^"]+)"\s+content="([^"]*)~i', $site_html,$matches);
$ogtags=array();
for($i=0;$i<count($matches[1]);$i++)
{
$ogtags[$matches[1][$i]]=$matches[2][$i];
}
回答by MSS
This function does the job without dependency and DOM parsing:
此函数无需依赖和 DOM 解析即可完成工作:
function getOgTags($html)
{
$pattern='/<\s*meta\s+property="og:([^"]+)"\s+content="([^"]*)/i';
if(preg_match_all($pattern, $html, $out))
return array_combine($out[1], $out[2]);
return array();
}
test code:
测试代码:
$x=' <title>php - Using domDocument, and parsing info, I would like to get the 'href' contents of an 'a' tag - Stack Overflow</title>
<link rel="shortcut icon" href="https://cdn.sstatic.net/Sites/stackoverflow/img/favicon.ico?v=4f32ecc8f43d">
<link rel="apple-touch-icon image_src" href="https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a">
<link rel="search" type="application/opensearchdescription+xml" title="Stack Overflow" href="/opensearch.xml">
<meta name="referrer" content="origin" />
<meta property="og:type" content="website"/>
<meta property="og:url" content="https://stackoverflow.com/questions/5278418/using-domdocument-and-parsing-info-i-would-like-to-get-the-href-contents-of"/>
<meta property="og:image" itemprop="image primaryImageOfPage" content="https://cdn.sstatic.net/Sites/stackoverflow/img/[email protected]?v=73d79a89bded" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:domain" content="stackoverflow.com"/>
<meta name="twitter:title" property="og:title" itemprop="title name" content="Using domDocument, and parsing info, I would like to get the 'href' contents of an 'a' tag" />
<meta name="twitter:description" property="og:description" itemprop="description" content="Possible Duplicate:
Regular expression for grabbing the href attribute of an A element
This displays the what is between the a tag, but I would like a way to get the href contents as well.
Is..." />';
echo '<pre>';
var_dump(getOgTags($x));
and you get:
你会得到:
array(3) {
["type"]=>
string(7) "website"
["url"]=>
string(119) "https://stackoverflow.com/questions/5278418/using-domdocument-and-parsing-info-i-would-like-to-get-the-href-contents-of"
["image"]=>
string(85) "https://cdn.sstatic.net/Sites/stackoverflow/img/[email protected]?v=73d79a89bded"
}
回答by Muhammad Tahir
Here is what i am using to extract Og tags.
这是我用来提取 Og 标签的内容。
function get_og_tags($get_url = "", $ret = 0)
{
if ($get_url != "") {
$title = "";
$description = "";
$keywords = "";
$og_title = "";
$og_image = "";
$og_url = "";
$og_description = "";
$full_link = "";
$image_urls = array();
$og_video_name = "";
$youtube_video_url="";
$get_url = $get_url;
$ret_data = file_get_contents_curl($get_url);
//$html = file_get_contents($get_url);
$html = $ret_data['curlData'];
$full_link = $ret_data['full_link'];
$full_link = addhttp($full_link);
//parsing begins here:
$doc = new DOMDocument();
@$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
if ($nodes->length == 0) {
$title = $get_url;
} else {
$title = $nodes->item(0)->nodeValue;
}
//get and display what you need:
$metas = $doc->getElementsByTagName('meta');
for ($i = 0; $i < $metas->length; $i++) {
$meta = $metas->item($i);
if ($meta->getAttribute('name') == 'description')
$description = $meta->getAttribute('content');
if ($meta->getAttribute('name') == 'keywords')
$keywords = $meta->getAttribute('content');
}
$og = $doc->getElementsByTagName('og');
for ($i = 0; $i < $metas->length; $i++) {
$meta = $metas->item($i);
if ($meta->getAttribute('property') == 'og:title')
$og_title = $meta->getAttribute('content');
if ($meta->getAttribute('property') == 'og:url')
$og_url = $meta->getAttribute('content');
if ($meta->getAttribute('property') == 'og:image')
$og_image = $meta->getAttribute('content');
if ($meta->getAttribute('property') == 'og:description')
$og_description = $meta->getAttribute('content');
// for sociotube video share
if ($meta->getAttribute('property') == 'og:video_name')
$og_video_name = $meta->getAttribute('content');
// for sociotube youtube video share
if ($meta->getAttribute('property') == 'og:youtube_video_url')
$youtube_video_url = $meta->getAttribute('content');
}
//if no image found grab images from body
if ($og_image != "") {
$image_urls[] = $og_image;
} else {
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query("//img"); // find your image
$imgCount = 0;
for ($i = 0; $i < $nodelist->length; $i++) {
$node = $nodelist->item($i); // gets the 1st image
if (isset($node->attributes->getNamedItem('src')->nodeValue)) {
$src = $node->attributes->getNamedItem('src')->nodeValue;
}
if (isset($node->attributes->getNamedItem('src')->value)) {
$src = $node->attributes->getNamedItem('src')->value;
}
if (isset($src)) {
if (!preg_match('/blank.(.*)/i', $src) && filter_var($src, FILTER_VALIDATE_URL)) {
$image_urls[] = $src;
if ($imgCount == 10) break;
$imgCount++;
}
}
}
}
$page_title = ($og_title == "") ? $title : $og_title;
if(!empty($og_video_name)){
// for sociotube video share
$page_body = $og_video_name;
}else{
// for post share
$page_body = ($og_description == "") ? $description : $og_description;
}
$output = array('title' => $page_title, 'images' => $image_urls, 'content' => $page_body, 'link' => $full_link,'video_name'=>$og_video_name,'youtube_video_url'=>$youtube_video_url);
if ($ret == 1) {
return $output; //output JSON data
}
echo json_encode($output); //output JSON data
die;
} else {
$data = array('error' => "Url not found");
if ($ret == 1) {
return $data; //output JSON data
}
echo json_encode($data);
die;
}
}
usage of the function
函数的使用
$url = "https://www.alectronics.com";
$tagsArray = get_og_tags($url);
print_r($tagsArray);
回答by J. Doe
With native PHP function get_meta_tags().
使用原生 PHP 函数 get_meta_tags()。
回答by Stefan Gehrig
The more XML
ish way would be to use XPath:
更简单的XML
方法是使用 XPath:
$xml = simplexml_load_file('http://ogp.me/');
$xml->registerXPathNamespace('h', 'http://www.w3.org/1999/xhtml');
$result = array();
foreach ($xml->xpath('//h:meta[starts-with(@property, \'og:\')]') as $meta) {
$result[(string)$meta['property']] = (string)$meta['content'];
}
print_r($result);
Unfortunately the namespace registration is needed if the HTML document uses a namespace declaration in the <html>
-tag.
不幸的是,如果 HTML 文档在<html>
-tag 中使用名称空间声明,则需要名称空间注册。