如何使用 cURL 从网站获取特定数据，然后使用 php 将其保存在我的数据库中

Question

提问by Eka

can any one tell me how to use curl or file_get_contents for downloading specific data from a website and then save those specific data into my mysql database. I want to get latest additions of films from this website http://www.traileraddict.com/and i want to save it in my database(on a daily basis; this text and html link will be shown on my website). I just need the text and html link.(highlighted in the pic)

谁能告诉我如何使用 curl 或 file_get_contents 从网站下载特定数据，然后将这些特定数据保存到我的 mysql 数据库中。我想从这个网站http://www.traileraddict.com/获取最新添加的电影，我想将它保存在我的数据库中（每天一次；此文本和 html 链接将显示在我的网站上）。我只需要文本和 html 链接。（在图片中突出显示）

enter image description here

在此处输入图片说明

i have searched everywhere but i didnt find any useful tutorial. i have two main questions to ask

我到处搜索，但没有找到任何有用的教程。我有两个主要问题要问

1) How can i get specific data using cURL or file_get_contents.

1) 如何使用 cURL 或 file_get_contents 获取特定数据。

2) How can i save the specific content to my mysql database table( text in one column and link in another column)

2) 如何将特定内容保存到我的 mysql 数据库表中（一列中的文本和另一列中的链接）

Answer 1

回答by SS44

Using cURL:

使用卷曲：

$ch = curl_init();
curl_setopt( $ch, CURLOPT_URL, 'http://www.something.com');
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true);

$content = curl_exec($ch);

Then you can load the element into a DOM Object and parse the dom for the specific data. You could also try and parse the data using search strings, but using regex on HTML is highly frowned upon.

然后，您可以将元素加载到 DOM 对象中并解析 dom 以获取特定数据。您也可以尝试使用搜索字符串解析数据，但在 HTML 上使用正则表达式是非常不受欢迎的。

$dom = new DOMDocument();
$dom->loadHTML( $content );

// Parse the dom for your desired content

http://www.php.net/manual/en/class.domdocument.php

http://www.php.net/manual/en/class.domdocument.php

Answer 2

回答by martincarlin87

This should work but it's messy and possible it will break if the site you are scraping happens to change it's markup which will affect the scraping:

这应该可以工作，但它很混乱，如果您正在抓取的站点碰巧更改了会影响抓取的标记，它可能会中断：

$sites[0] = 'http://www.traileraddict.com/';

// use this if you want to retrieve more than one page:
// $sites[1] = 'http://www.traileraddict.com/trailers/2';


foreach ($sites as $site)
{
    $ch = curl_init($site);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $html = curl_exec($ch);


    // ok, you have the whole page in the $html variable
    // now you need to find the common div that contains all the review info
    // and that appears to be <div class="info"> (I think you could use abstract aswell)
    $title_start = '<div class="info">';

    $parts = explode($title_start,$html);

    // now you have an array of the info divs on the page

    foreach($parts as $part){

    // so now you just need to get your title and link from each part

    $link = explode('<a href="/trailer/', $part);

    // this means you now have part of the trailer url, you just need to cut off the end which you don't need:

   $link = explode('">', $link[1]);

   // this should give something of the form:
   // overnight-2012/trailer
   // so just make an absolute url out of it:

   $url = 'http://www.traileraddict.com/trailer/'.$link[0];

  // now for the title we need to follow a similar process:

  $title = explode('<h2>', $part);

  $title = explode('</h2>', $title[1]);

  $title = strip_tags($title[0]);

  // INSERT DB CODE HERE e.g.

  $db_conn = mysql_connect('$host', '$user', '$password') or die('error');
  mysql_select_db('$database', $db_conn) or die(mysql_error());

 $sql = "INSERT INTO trailers(url, title) VALUES ('".$url."', '".$title."')"

 mysql_query($sql) or die(mysql_error()); 

}

That should be it, now you have a variable for the link and title that you can insert into your database.

应该是这样，现在您有一个链接和标题的变量，您可以将其插入到您的数据库中。

DISCLAIMER

免责声明

I have written this from the top of my head at work so I apologise if it doesn't work straight off the bat but let me know if it doesn't and I will try and help further.

我是在工作中从头开始写的，所以如果它不能立即起作用，我深表歉意，但如果它不起作用，请告诉我，我会尝试进一步提供帮助。

ALSO, I am aware this could be done smarter and using less steps but that would involve more thinking on my part and the OP can do this if they wish once they have understood the code I have written, since I would assume it would be a lot more important that they understand what I have done and be able to edit it themselves.

另外，我知道这可以更聪明地完成并使用更少的步骤，但这将涉及我更多的思考，如果他们希望，一旦他们理解了我编写的代码，OP 可以做到这一点，因为我认为这将是一个更重要的是他们了解我所做的并能够自己编辑。

Also, I would advise scraping the site at night so as not to burden it with extra traffic and I would suggest asking for the permission of that site aswell since if they catch you they will be able to put an end to your scraping :(

此外，我建议在晚上抓取该网站，以免给它带来额外的流量负担，并且我建议您也请求该网站的许可，因为如果他们抓住您，他们将能够结束您的抓取:(

To answer your final point - to run this at a set time period you would use a cron job.

为了回答您的最后一点 - 要在设定的时间段内运行它，您将使用 cron 作业。

如何使用 cURL 从网站获取特定数据，然后使用 php 将其保存在我的数据库中

提问by Eka

回答by SS44

回答by martincarlin87

相关推荐

最近更新

标签

如何使用 cURL 从网站获取特定数据，然后使用 php 将其保存在我的数据库中

提问by Eka

回答by SS44

回答by martincarlin87

相关推荐

用 PHP 处理多维 JSON 数组

使用 PHP 从 Active Directory 获取用户的全名

FPDF - WriteHTML 函数中的 PHP？

php 如何用php打印$

相关推荐

最近更新

标签