如何使用 PHP 从网站获取文本?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6728453/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-26 01:05:39  来源:igfitidea点击:

How do I get text from a website using PHP?

phpweb

提问by Alper

So, I'm working on a PHP script, and part of it needs to be able to query a website, then get text from it.

所以,我正在处理一个 PHP 脚本,其中一部分需要能够查询网站,然后从中获取文本。

First off, I need to be able to query a certain website URL, then I need to be able to get text from the text from that website after the query, and be able to return that text out of the function.

首先,我需要能够查询某个网站 URL,然后我需要能够在查询后从该网站的文本中获取文本,并能够从函数中返回该文本。

How would I query the website and get the text from it?

我将如何查询网站并从中获取文本?

回答by Brad

The easiest way:

最简单的方法:

file_get_contents()

file_get_contents()

That will get you the source of the web page.

这将使您获得网页的来源。

You probably want something a bit more complete though, so look into cURL, for better error handling, and setting user-agent, and what not.

不过,您可能想要更完整的东西,因此请查看cURL,以获得更好的错误处理和设置用户代理,等等。

From there, if you want the text only, you are going to have to parse the page. For that, see: How do you parse and process HTML/XML in PHP?

从那里,如果你只想要文本,你将不得不解析页面。为此,请参阅: 如何在 PHP 中解析和处理 HTML/XML?

回答by Erick Martinez

I would do a dom search, take a look at http://www.php.net/manual/es/domdocument.load.phpDomxpath might be very useful too: http://php.net/manual/en/class.domxpath.php

我会做一个 dom 搜索,看看http://www.php.net/manual/es/domdocument.load.phpDomxpath 也可能非常有用:http ://php.net/manual/en/class .domxpath.php

$doc = new DOMDocument;
$doc->load("http://mysite.com");
$xpath = new DOMXpath($doc);
$elements = $xpath->query("*/div[@id='yourTagIdHere']");

回答by Michael d

Can this be done by getting all of the content from the webpage utilizing methods already listed above, and then using regexto removeall characters between open and closed brackets?

可以通过这个让所有从使用上述已列出的方法,网页上的内容,然后用做正则表达式删除开放和封闭括号之间的所有字符?

A page that looks like this:

一个看起来像这样的页面:

<html><style> h1 { font-style:... }</style><h1>stuff in here</h1></html>

Would then become this after regex:

在 regex 之后会变成这个:

h1 { font-style:... }stuff in here

And because we want to remove all of the code in between various tags such as the [style] tag, we could then first use regex to remove all characters between [style and /style] so that we are just left with:

因为我们想要删除各种标签之间的所有代码,例如 [style] 标签,我们可以首先使用正则表达式删除 [style 和 /style] 之间的所有字符,这样我们就只剩下:

stuff in here

Would this work then? Please reply if you think it would or if you foresee errors as I would like to create a tool with this parsing.

这会起作用吗?如果您认为会或如果您预见到错误,请回复,因为我想使用此解析创建一个工具。

回答by Francois Deschenes

You can use file_get_contentsor if you need a little more control (i.e. to submit POST requests, to set the user agent string, ...) you may want to look at cURL.

您可以使用file_get_contents或者如果您需要更多控制(即提交 POST 请求,设置用户代理字符串,...),您可能需要查看cURL

file_get_contentsExample:

file_get_contents例子:

$content = file_get_contents('http://www.example.org');

Basic cURL Example:

基本卷曲示例:

$ch = curl_init('http://www.example.org');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3');

$content = curl_exec($ch);

curl_close($ch);

回答by Paul

If you have Curl installed, use it. Otherwise:

如果您安装了 Curl,请使用它。除此以外:

$website = file_get_contents('http://google.com');

Then you need to search through the string for the text you want. How you do that depends on the website, and the text you're trying to read.

然后您需要在字符串中搜索您想要的文本。您如何做到这一点取决于网站以及您尝试阅读的文本。

回答by Hammad Khan

you need to use CURL. You can get some samples here

你需要使用卷曲。你可以在这里得到一些样品

回答by Mingle

If you want more control, use cURL. Otherwise: file_get_contents..

如果您想要更多控制,请使用 cURL。否则:file_get_contents..

$url  = "http://www.example.com/test.php";  // Site URL.
$site = file_get_contents($url);             // Gets site response.