php 通过 API 访问维基百科页面的主图片
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8363531/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Accessing main picture of wikipedia page by API
提问by insomiac
Is there any way I can access the thumbnail picture of any wikipedia page by using an API? I mean the image on the top right side in box. Is there any APIs for that?
有什么方法可以使用 API 访问任何维基百科页面的缩略图?我的意思是盒子右上角的图像。是否有任何 API?
采纳答案by varatis
http://en.wikipedia.org/w/api.php
http://en.wikipedia.org/w/api.php
Look at prop=images
.
看看prop=images
。
It returns an array of image filenames that are used in the parsed page. You then have the option of making another API call to find out the full image URL, e.g.:
action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url
它返回在解析页面中使用的图像文件名数组。然后,您可以选择进行另一个 API 调用以找出完整的图像 URL,例如:
action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url
or to calculate the URL via the filename's hash.
Unfortunately, while the array of images returned by prop=images
is in the order they are found on the page, the first can not be guaranteed to be the image in the info box because sometimes a page will include an image before the infobox (most of the time icons for metadata about the page: e.g. "this article is locked").
不幸的是,虽然返回的图像数组prop=images
是按照它们在页面上找到的顺序,但不能保证第一个是信息框中的图像,因为有时页面会包含信息框之前的图像(大多数情况下)有关页面元数据的图标:例如“这篇文章已锁定”)。
Searching the array of images for the first image that includes the page title is probably the best guess for the infobox image.
搜索包含页面标题的第一张图像的图像数组可能是对信息框图像的最佳猜测。
回答by Assaf Shemesh
You can get the thumbnail of any wikipedia page using prop=pageimages
. For example:
您可以使用 获取任何维基百科页面的缩略图prop=pageimages
。例如:
http://en.wikipedia.org/w/api.php?action=query&titles=Al-Farabi&prop=pageimages&format=json&pithumbsize=100
And you will get the thumbnail full URL.
您将获得缩略图完整 URL。
回答by Anuraj
This is good way to get the Main Image of a page in wikipedia
这是在维基百科中获取页面主图像的好方法
回答by kimbaudi
Check out the MediaWiki API example for getting the main picture of a wikipedia page: https://www.mediawiki.org/wiki/API:Page_info_in_search_results.
查看 MediaWiki API 示例以获取维基百科页面的主图片:https://www.mediawiki.org/wiki/API: Page_info_in_search_results。
As other's have mentioned, you would use prop=pageimages
in your API query.
正如其他人所提到的,您将prop=pageimages
在 API 查询中使用。
If you also want the image description, you would use prop=pageimages|pageterms
instead in your API query.
如果您还需要图像描述,则可以prop=pageimages|pageterms
在 API 查询中使用。
You can get the original image using piprop=original
. Or you can get a thumbnail image with a specified width/height. For a thumbnail with width/height=600, piprop=thumbnail&pithumbsize=600
. If you omit either, the image returned in the API callback will default to a thumbnail with width/height of 50px.
您可以使用 获取原始图像piprop=original
。或者您可以获得具有指定宽度/高度的缩略图图像。对于宽度/高度 = 600 的缩略图,piprop=thumbnail&pithumbsize=600
. 如果您省略其中任何一个,API 回调中返回的图像将默认为宽度/高度为 50 像素的缩略图。
If you are requesting results in JSON format, you should always use formatversion=2
in your API query (i.e., format=json&formatversion=2
) because it makes retrieving the image from the query easier.
如果您请求 JSON 格式的结果,则应始终formatversion=2
在 API 查询中使用(即format=json&formatversion=2
),因为它可以更轻松地从查询中检索图像。
Original Size Image:
原始尺寸图像:
https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages|pageterms&piprop=original&titles=Albert Einstein
Thumbnail Size (600px width/height) Image:
缩略图大小(600px 宽/高)图像:
https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages|pageterms&piprop=thumbnail&pithumbsize=600&titles=Albert Einstein
回答by Giberno
Way 1: You can try some query like this:
方式1:您可以尝试这样的查询:
http://en.wikipedia.org/w/api.php?action=opensearch&limit=5&format=xml&search=italy&namespace=0
http://en.wikipedia.org/w/api.php?action=opensearch&limit=5&format=xml&search=italy&namespace=0
in the response, you can see the Image
tag.
在响应中,您可以看到Image
标签。
<Item>
<Text xml:space="preserve">Italy national rugby union team</Text>
<Description xml:space="preserve">
The Italy national rugby union team represent the nation of Italy in the sport of rugby union.
</Description>
<Url xml:space="preserve">
http://en.wikipedia.org/wiki/Italy_national_rugby_union_team
</Url>
<Image source="http://upload.wikimedia.org/wikipedia/en/thumb/4/46/Italy_rugby.png/43px-Italy_rugby.png" width="43" height="50"/>
</Item>
Way 2: use query http://en.wikipedia.org/w/index.php?action=render&title=italy
方式 2:使用查询http://en.wikipedia.org/w/index.php?action=render&title=italy
then you can get a raw html code, you can get the image use something like PHP Simple HTML DOM Parser
http://simplehtmldom.sourceforge.net
然后你可以得到一个原始的 html 代码,你可以使用类似http://simplehtmldom.sourceforge.net 的东西来获取图像PHP Simple HTML DOM Parser
I have no time write it to you. just give you some advice, thanks.
我没有时间给你写信。只是给你一些建议,谢谢。
回答by óscar Palacios
I'm sorry for not answering specifically your question about the mainimage. But here's some code to get a list of all images:
很抱歉没有具体回答您关于主图的问题。但这里有一些代码来获取所有图像的列表:
function makeCall($url) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
return curl_exec($curl);
}
function wikipediaImageUrls($url) {
$imageUrls = array();
$pathComponents = explode('/', parse_url($url, PHP_URL_PATH));
$pageTitle = array_pop($pathComponents);
$imagesQuery = "http://en.wikipedia.org/w/api.php?action=query&titles={$pageTitle}&prop=images&format=json";
$jsonResponse = makeCall($imagesQuery);
$response = json_decode($jsonResponse, true);
$imagesKey = key($response['query']['pages']);
foreach($response['query']['pages'][$imagesKey]['images'] as $imageArray) {
if($imageArray['title'] != 'File:Commons-logo.svg' && $imageArray['title'] != 'File:P vip.svg') {
$title = str_replace('File:', '', $imageArray['title']);
$title = str_replace(' ', '_', $title);
$imageUrlQuery = "http://en.wikipedia.org/w/api.php?action=query&titles=Image:{$title}&prop=imageinfo&iiprop=url&format=json";
$jsonUrlQuery = makeCall($imageUrlQuery);
$urlResponse = json_decode($jsonUrlQuery, true);
$imageKey = key($urlResponse['query']['pages']);
$imageUrls[] = $urlResponse['query']['pages'][$imageKey]['imageinfo'][0]['url'];
}
}
return $imageUrls;
}
print_r(wikipediaImageUrls('http://en.wikipedia.org/wiki/Saturn_%28mythology%29'));
print_r(wikipediaImageUrls('http://en.wikipedia.org/wiki/Hans-Ulrich_Rudel'));
I got this for http://en.wikipedia.org/wiki/Saturn_%28mythology%29:
我为http://en.wikipedia.org/wiki/Saturn_%28mythology%29得到了这个:
Array
(
[0] => http://upload.wikimedia.org/wikipedia/commons/1/10/Arch_of_SeptimiusSeverus.jpg
[1] => http://upload.wikimedia.org/wikipedia/commons/8/81/Ivan_Akimov_Saturn_.jpg
[2] => http://upload.wikimedia.org/wikipedia/commons/d/d7/Lucius_Appuleius_Saturninus.jpg
[3] => http://upload.wikimedia.org/wikipedia/commons/2/2c/Polidoro_da_Caravaggio_-_Saturnus-thumb.jpg
[4] => http://upload.wikimedia.org/wikipedia/commons/b/bd/Porta_Maggiore_Alatri.jpg
[5] => http://upload.wikimedia.org/wikipedia/commons/6/6a/She-wolf_suckles_Romulus_and_Remus.jpg
[6] => http://upload.wikimedia.org/wikipedia/commons/4/45/Throne_of_Saturn_Louvre_Ma1662.jpg
)
And for the second URL (http://en.wikipedia.org/wiki/Hans-Ulrich_Rudel):
对于第二个 URL ( http://en.wikipedia.org/wiki/Hans-Ulrich_Rudel):
Array
(
[0] => http://upload.wikimedia.org/wikipedia/commons/e/e9/BmRKEL.jpg
[1] => http://upload.wikimedia.org/wikipedia/commons/3/3f/BmRKELS.jpg
[2] => http://upload.wikimedia.org/wikipedia/commons/2/2c/Bundesarchiv_Bild_101I-655-5976-04%2C_Russland%2C_Sturzkampfbomber_Junkers_Ju_87_G.jpg
[3] => http://upload.wikimedia.org/wikipedia/commons/6/62/Bundeswehr_Kreuz_Black.svg
[4] => http://upload.wikimedia.org/wikipedia/commons/9/99/Flag_of_German_Reich_%281935%E2%80%931945%29.svg
[5] => http://upload.wikimedia.org/wikipedia/en/6/64/HansUlrichRudel.jpeg
[6] => http://upload.wikimedia.org/wikipedia/commons/8/82/Heinkel_He_111_during_the_Battle_of_Britain.jpg
[7] => http://upload.wikimedia.org/wikipedia/commons/6/66/Regulation_WW_II_Underwing_Balkenkreuz.png
)
Note that the URL changed a bit on the 6th element of the second array. It's what @JosephJaber was warning about in his comment above.
请注意,第二个数组的第 6 个元素的 URL 略有变化。这就是@JosephJaber 在上面的评论中所警告的。
Hope this helps someone.
希望这可以帮助某人。
回答by vanwinter
I have written some code that gets main image (full URL) by Wikipedia article title. It's not perfect, but overall I'm very pleased with the results.
我编写了一些代码,可以通过维基百科文章标题获取主图像(完整 URL)。这并不完美,但总的来说,我对结果非常满意。
The challenge was that when queried for a specific title, Wikipedia returns multiple image filenames (without path). Furthermore, the secondary search (I used the code varatis posted in this thread - thanks!) returns URLs of all images found based on the image filename that was searched, regardless of the original article title. After all this, we may end up with a generic image irrelevant to the search, so we filter those out. The code iterates over filenames and URLs until it finds (hopefully the best) match... a bit complicated, but it works :)
挑战在于,当查询特定标题时,维基百科会返回多个图像文件名(不带路径)。此外,二级搜索(我使用了此线程中发布的代码 varatis - 谢谢!)根据搜索到的图像文件名返回所有图像的 URL,而不管原始文章标题如何。毕竟,我们最终可能会得到与搜索无关的通用图像,因此我们将其过滤掉。代码迭代文件名和 URL,直到找到(希望是最好的)匹配......有点复杂,但它有效:)
Note on the generic filter: I've been compiling a list of generic image strings for the isGeneric() function, but the list just keeps growing. I am considering maintaining it as a public list - if there is any interest let me know.
关于通用过滤器的注意事项:我一直在为 isGeneric() 函数编译通用图像字符串列表,但该列表一直在增长。我正在考虑将其保留为公开列表 - 如果有任何兴趣,请告诉我。
Pre:
前:
protected static $baseurl = "http://en.wikipedia.org/w/api.php";
Main function - get image URL from title:
主要功能 - 从标题中获取图片 URL:
public static function getImageURL($title)
{
$images = self::getImageFilenameObj($title); // returns JSON object
if (!$images) return '';
foreach ($images as $image)
{
// get object of image URL for given filename
$imgjson = self::getFileURLObj($image->title);
// return first image match
foreach ($imgjson as $img)
{
// get URL for image
$url = $img->imageinfo[0]->url;
// no image found
if (!$url) continue;
// filter generic images
if (self::isGeneric($url)) continue;
// match found
return $url;
}
}
// match not found
return '';
}
== The following functions are called by the main function above ==
== 下面的函数被上面的main函数调用==
Get JSON object (filenames) by title:
按标题获取 JSON 对象(文件名):
public static function getImageFilenameObj($title)
{
try // see if page has images
{
// get image file name
$json = json_decode(
self::retrieveInfo(
self::$baseurl . '?action=query&titles=' .
urlencode($title) . '&prop=images&format=json'
))->query->pages;
/** The foreach is only to get around
* the fact that we don't have the id.
*/
foreach ($json as $id) { return $id->images; }
}
catch(exception $e) // no images
{
return NULL;
}
}
Get JSON object (URLs) by filename:
按文件名获取 JSON 对象(URL):
public static function getFileURLObj($filename)
{
try // resolve URL from filename
{
return json_decode(
self::retrieveInfo(
self::$baseurl . '?action=query&titles=' .
urlencode($filename) . '&prop=imageinfo&iiprop=url&format=json'
))->query->pages;
}
catch(exception $e) // no URLs
{
return NULL;
}
}
Filter out generic images:
过滤掉通用图像:
public static function isGeneric($url)
{
$generic_strings = array(
'_gray.svg',
'icon',
'Commons-logo.svg',
'Ambox',
'Text_document_with_red_question_mark.svg',
'Question_book-new.svg',
'Canadese_kano',
'Wiki_letter_',
'Edit-clear.svg',
'WPanthroponymy',
'Compass_rose_pale',
'Us-actor.svg',
'voting_box',
'Crystal_',
'transportation_inv',
'arrow.svg',
'Quill_and_ink-US.svg',
'Decrease2.svg',
'Rating-',
'template',
'Nuvola_apps_',
'Mergefrom.svg',
'Portal-',
'Translation_to_',
'/School.svg',
'arrow',
'Symbol_',
'stub',
'Unbalanced_scales.svg',
'-logo.',
'P_vip.svg',
'Books-aj.svg_aj_ashton_01.svg',
'Film',
'/Gnome-',
'cap.svg',
'Missing',
'silhouette',
'Star_empty.svg',
'Music_film_clapperboard.svg',
'IPA_Unicode',
'symbol',
'_highlighting_',
'pictogram',
'Red_pog.svg',
'_medal_with_cup',
'_balloon',
'Feature',
'Aiga_'
);
foreach ($generic_strings as $str)
{
if (stripos($url, $str) !== false) return true;
}
return false;
}
Comments welcome.
欢迎评论。
回答by Paul Weber
I there is a way to reliably get a main image for a wikipedia page - the Extension called PageImages
我有一种方法可以可靠地获取维基百科页面的主图像 - 名为 PageImages 的扩展
The PageImages extension collects information about images used on a page.
Its aim is to return the single most appropriate thumbnail associated with an article, attempting to return only meaningful images, e.g. not those from maintenance templates, stubs or flag icons. Currently it uses the first non-meaningless image used in the page.
PageImages 扩展收集有关页面上使用的图像的信息。
它的目的是返回与文章关联的单个最合适的缩略图,尝试仅返回有意义的图像,例如不是来自维护模板、存根或标志图标的图像。目前它使用页面中使用的第一个无意义的图像。
https://www.mediawiki.org/wiki/Extension:PageImages
https://www.mediawiki.org/wiki/Extension:PageImages
Just add the prop pageimages to your API Query:
只需将道具页面图像添加到您的 API 查询:
/w/api.php?action=query&prop=pageimages&titles=Somepage&format=xml
This reliably filters out annoying default images and prevents you from having to filter them yourself! The extension is installed on all the main wikipedia pages...
这可以可靠地过滤掉烦人的默认图像,并防止您自己过滤它们!该扩展程序安装在所有主要的维基百科页面上...
回答by netfed
Like Anuraj mentioned, the pageimages parameter is it. Look at the following url that'll bring about some nifty stuff:
就像 Anuraj 提到的,pageimages 参数就是它。看看下面的 url 会带来一些漂亮的东西:
https://en.wikipedia.org/w/api.php?action=query&prop=info|extracts|pageimages|images&inprop=url&exsentences=1&titles=india
Her are some interesting parameters:
她是一些有趣的参数:
- The two parameters extractsand exsentencesgives you a short description you can use. (exsentences is the number of sentences you want to include in the excerpt)
- The info and the inprop=urlparameters gives you the url of the page
- The prop property has multiple parameters separated by a bar symbol
- And if you insert the format=jsonin there, it is even better
- 两个参数extracts和exsentences为您提供了可以使用的简短描述。(exsentences 是您要包含在摘录中的句子数)
- info 和inprop=url参数为您提供页面的 url
- prop 属性具有多个由条形符号分隔的参数
- 如果你在那里插入format=json,那就更好了
回答by marika.daboja
You can also use cocoa Pod called SDWebImage
您还可以使用名为SDWebImage 的可可豆荚
Code sample (remember to also add import SDWebImage
):
代码示例(记得还要加上import SDWebImage
):
func requestInfo(flowerName: String) {
let parameters : [String:String] = [
"format" : "json",
"action" : "query",
"prop" : "extracts|pageimages",//pageimages allows fetch imagePath
"exintro" : "",
"explaintext" : "",
"titles" : flowerName,
"indexpageids" : "",
"redirects" : "1",
"pithumbsize" : "500"//specify image size in px
]
AF.request(wikipediaURL, method: .get, parameters: parameters).responseJSON { (response) in
switch response.result {
case .success(let value):
print("Got the wikipedia info.")
print(response)
let flowerJSON : JSON = JSON(response.value!)
let pageid = flowerJSON["query"]["pageids"][0].stringValue
let flowerDescription = flowerJSON["query"]["pages"][pageid]["extract"].stringValue
let flowerImageURL = flowerJSON["query"]["pages"][pageid]["thumbnail"]["source"].stringValue //fetching Image URL
self.wikiInfoLabel.text = flowerDescription
self.imageView.sd_setImage(with: URL(string : flowerImageURL))//imageView updated with Wiki Image
case .failure(let error):
print(error)
}
}
}