Html 每天从另一个网站抓取内容

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14388709/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-29 05:01:24  来源:igfitidea点击:

Grab content from another website daily

htmlparsingrsssync

提问by Brett Merrifield

Here is my problem. I am creating a website which has a "news" tab. What i want on the news tab is updated content from another news website.

这是我的问题。我正在创建一个具有“新闻”选项卡的网站。我想要的新闻标签是来自另一个新闻网站的更新内容。

Is there any way to grab plain text posted on another website, post it on a news tab in my website, and update automatically when the website posts new content? Can anybody push me in the right direction so i can learn how to do this?

有什么方法可以抓取发布在另一个网站上的纯文本,将其发布在我网站的新闻选项卡上,并在网站发布新内容时自动更新?任何人都可以将我推向正确的方向,以便我可以学习如何做到这一点?

I know HTML very well, but lack skill in PHP and Javascript. What do i have to learn in order to pull this off?

我非常了解 HTML,但缺乏 PHP 和 Javascript 技能。为了实现这一目标,我需要学习什么?

采纳答案by Eric

This book has a section that demonstrates reading of data from another website and parsing it using PHP. Chapter 10, pg 328 "Accessing other websites".

本书有一个部分演示了从另一个网站读取数据并使用 PHP 对其进行解析。第 10 章,第 328 页“访问其他网站”。

http://www.amazon.com/PHP-Advanced-Object-Oriented-Programming-QuickPro/dp/0321832183/

http://www.amazon.com/PHP-Advanced-Object-Oriented-Programming-QuickPro/dp/0321832183/

Though, if you're new to PHP, and advanced Book is no place to start. I would recommend either of the following to get you started down that road.

但是,如果您是 PHP 新手,那么高级 Book 不是开始的地方。我会推荐以下任一项来帮助您开始这条道路。

http://www.amazon.com/PHP-MySQL-Dynamic-Web-Sites/dp/0321784073/

http://www.amazon.com/PHP-MySQL-Dynamic-Web-Sites/dp/0321784073/

or

或者

http://www.amazon.com/PHP-Web-Visual-QuickStart-Guide/dp/0321733452/

http://www.amazon.com/PHP-Web-Visual-QuickStart-Guide/dp/0321733452/

You may be able to cobble together what you need using the Advanced book, but the best way to use advanced skills is to start learning as a beginner!

您可以使用高级书籍拼凑您需要的内容,但使用高级技能的最佳方法是作为初学者开始学习!

回答by Brian Smith

Look up Curl... it is in php. http://php.net/manual/en/book.curl.php

查找 Curl...它在 php 中。 http://php.net/manual/en/book.curl.php

Here is a nice video on it, that might be related to something you're trying to pull off. http://www.youtube.com/watch?v=PvEJz6du7R0

这是一个关于它的不错的视频,它可能与您尝试实现的内容有关。 http://www.youtube.com/watch?v=PvEJz6du7R0

Here is also some code, to get the source code of a website using curl.

这里还有一些代码,用于使用 curl 获取网站的源代码。

<?php

$ch = curl_init("http://www.example-webpage.com/file.html");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
$content = curl_exec($ch);
curl_close($ch);
echo $content;

?>

One more way of doing what you want, is to use an iframe within a div...

另一种做你想做的事情的方法是在一个 div 中使用 iframe ...

<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<style type="text/css">
<!--
#container{
    width:300px;
    height:100px;
    border:1px solid #000; 
    overflow:hidden;
    margin-left:50%;
    margin-top:5%;

}
#container iframe {
    width:1000px;
    height:750px;
    margin-left:-734px;
    margin-top:-181px;   
    border:0 solid;
 }
-->
</style>

</head>
<body>

<div id="container">
<iframe src="http://www.w3schools.com/" scrolling="no"></iframe>
</div>

</body>
</html>

Some websites don't allow you to iframe their site, so this might not work. Example, you can't iframe google, youtube, yahoo, and others.

有些网站不允许您对他们的网站进行 iframe,因此这可能不起作用。例如,您不能 iframe google、youtube、yahoo 等。

Hope this helped :D

希望这有帮助:D

回答by Emery King

You'll need to use file_get_contentsand parse the html for what you want. If you want it to update periodically, you'll want to run this script on a "cron task".

您需要使用file_get_contents并解析您想要的 html。如果您希望它定期更新,您需要在“cron 任务”上运行此脚本。

If the news site has an RSS feed you could parse that instead, more effectively using SimpleXML

如果新闻站点有 RSS 提要,您可以改为解析它,使用SimpleXML更有效