如何使用 cURL 从网站获取特定数据,然后使用 php 将其保存在我的数据库中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9515891/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-26 06:59:32  来源:igfitidea点击:

How to use cURL to fetch specific data from a website and then save it my database using php

phpmysql

提问by Eka

can any one tell me how to use curl or file_get_contents for downloading specific data from a website and then save those specific data into my mysql database. I want to get latest additions of films from this website http://www.traileraddict.com/and i want to save it in my database(on a daily basis; this text and html link will be shown on my website). I just need the text and html link.(highlighted in the pic)

谁能告诉我如何使用 curl 或 file_get_contents 从网站下载特定数据,然后将这些特定数据保存到我的 mysql 数据库中。我想从这个网站http://www.traileraddict.com/获取最新添加的电影,我想将它保存在我的数据库中(每天一次;此文本和 html 链接将显示在我的网站上)。我只需要文本和 html 链接。(在图片中突出显示)

enter image description here

在此处输入图片说明

i have searched everywhere but i didnt find any useful tutorial. i have two main questions to ask

我到处搜索,但没有找到任何有用的教程。我有两个主要问题要问

1) How can i get specific data using cURL or file_get_contents.

1) 如何使用 cURL 或 file_get_contents 获取特定数据。

2) How can i save the specific content to my mysql database table( text in one column and link in another column)

2) 如何将特定内容保存到我的 mysql 数据库表中(一列中的文本和另一列中的链接)

回答by SS44

Using cURL:

使用卷曲:

$ch = curl_init();
curl_setopt( $ch, CURLOPT_URL, 'http://www.something.com');
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true);

$content = curl_exec($ch);

Then you can load the element into a DOM Object and parse the dom for the specific data. You could also try and parse the data using search strings, but using regex on HTML is highly frowned upon.

然后,您可以将元素加载到 DOM 对象中并解析 dom 以获取特定数据。您也可以尝试使用搜索字符串解析数据,但在 HTML 上使用正则表达式是非常不受欢迎的。

$dom = new DOMDocument();
$dom->loadHTML( $content );

// Parse the dom for your desired content

回答by martincarlin87

This should work but it's messy and possible it will break if the site you are scraping happens to change it's markup which will affect the scraping:

这应该可以工作,但它很混乱,如果您正在抓取的站点碰巧更改了会影响抓取的标记,它可能会中断:

$sites[0] = 'http://www.traileraddict.com/';

// use this if you want to retrieve more than one page:
// $sites[1] = 'http://www.traileraddict.com/trailers/2';


foreach ($sites as $site)
{
    $ch = curl_init($site);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $html = curl_exec($ch);


    // ok, you have the whole page in the $html variable
    // now you need to find the common div that contains all the review info
    // and that appears to be <div class="info"> (I think you could use abstract aswell)
    $title_start = '<div class="info">';

    $parts = explode($title_start,$html);

    // now you have an array of the info divs on the page

    foreach($parts as $part){

    // so now you just need to get your title and link from each part

    $link = explode('<a href="/trailer/', $part);

    // this means you now have part of the trailer url, you just need to cut off the end which you don't need:

   $link = explode('">', $link[1]);

   // this should give something of the form:
   // overnight-2012/trailer
   // so just make an absolute url out of it:

   $url = 'http://www.traileraddict.com/trailer/'.$link[0];

  // now for the title we need to follow a similar process:

  $title = explode('<h2>', $part);

  $title = explode('</h2>', $title[1]);

  $title = strip_tags($title[0]);

  // INSERT DB CODE HERE e.g.

  $db_conn = mysql_connect('$host', '$user', '$password') or die('error');
  mysql_select_db('$database', $db_conn) or die(mysql_error());

 $sql = "INSERT INTO trailers(url, title) VALUES ('".$url."', '".$title."')"

 mysql_query($sql) or die(mysql_error()); 

}

That should be it, now you have a variable for the link and title that you can insert into your database.

应该是这样,现在您有一个链接和标题的变量,您可以将其插入到您的数据库中。

DISCLAIMER

免责声明

I have written this from the top of my head at work so I apologise if it doesn't work straight off the bat but let me know if it doesn't and I will try and help further.

我是在工作中从头开始写的,所以如果它不能立即起作用,我深表歉意,但如果它不起作用,请告诉我,我会尝试进一步提供帮助。

ALSO, I am aware this could be done smarter and using less steps but that would involve more thinking on my part and the OP can do this if they wish once they have understood the code I have written, since I would assume it would be a lot more important that they understand what I have done and be able to edit it themselves.

另外,我知道这可以更聪明地完成并使用更少的步骤,但这将涉及我更多的思考,如果他们希望,一旦他们理解了我编写的代码,OP 可以做到这一点,因为我认为这将是一个更重要的是他们了解我所做的并能够自己编辑。

Also, I would advise scraping the site at night so as not to burden it with extra traffic and I would suggest asking for the permission of that site aswell since if they catch you they will be able to put an end to your scraping :(

此外,我建议在晚上抓取该网站,以免给它带来额外的流量负担,并且我建议您也请求该网站的许可,因为如果他们抓住您,他们将能够结束您的抓取:(

To answer your final point - to run this at a set time period you would use a cron job.

为了回答您的最后一点 - 要在设定的时间段内运行它,您将使用 cron 作业。