使用 PHP 运行 Javascript 后获取 URL 的内容(文本)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28505501/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get the content (text) of an URL after Javascript has run with PHP
提问by Victor Ferreira
Is it possible to get the content of a URL with PHP (using some sort of function like file_get_contentsor header) but only after the execution of some JavaScript code?
是否可以使用 PHP(使用某种函数,如file_get_contents或header)获取 URL 的内容,但只能在执行某些 JavaScript 代码之后?
Example:
例子:
mysite.com has a script that does loadUrlAfterJavascriptExec('http://exampletogetcontent.com/')and prints/echoes the content. imagine that some jQuery runs on http://exampletogetcontent.com/that changes DOM, and loadUrlAfterJavascriptExecwill get the resulting HTML
mysite.com 有一个执行loadUrlAfterJavascriptExec('http://exampletogetcontent.com/')和打印/回显内容的脚本。想象一下,一些 jQuery 运行在http://exampletogetcontent.com/改变 DOM 上,loadUrlAfterJavascriptExec并将获得结果 HTML
Can we do that?
我们可以这样做吗?
Just to be clear, what I want is to get the content of a page through a URL, but only after JavaScript runs on the target page (the one PHP is getting its content).
明确地说,我想要的是通过 URL 获取页面的内容,但只有在 JavaScript 在目标页面上运行之后(PHP 正在获取其内容)。
I am aware PHP runs before the page is sent to the client, and JS only after that, but thought that maybe there was an expert workaround.
我知道 PHP 在页面发送到客户端之前运行,而 JS 只在此之后运行,但认为可能有专家解决方法。
回答by AndrewD
Update 2Adds more details on how to use phantomjsfrom PHP.
更新 2添加了有关如何phantomjs从 PHP使用的更多详细信息。
Update 1(after clarification that javascript on targetpage need to run first)
更新 1(澄清目标页面上的 javascript需要先运行后)
Method 1:Use phantomjs(will execute javascript);
方法一:使用phantomjs(会执行javascript);
1.Download phantomjsand place the executable in a path that your PHP binary can reach.
1.下载phantomjs并将可执行文件放在 PHP 二进制文件可以访问的路径中。
2.Place the following 2 files in the same directory:
2.将以下2个文件放在同一目录下:
get-website.php
获取-website.php
<?php
$phantom_script= dirname(__FILE__). '/get-website.js';
$response = exec ('phantomjs ' . $phantom_script);
echo htmlspecialchars($response);
?>
get-website.js
获取-website.js
var webPage = require('webpage');
var page = webPage.create();
page.open('http://google.com/', function(status) {
console.log(page.content);
phantom.exit();
});
3.Browse to get-website.phpand the target site, http://google.comcontents will return after executing inline javascript. You can also call this from a command line using php /path/to/get-website.php.
3.浏览到get-website.php目标站点,http://google.com执行内联javascript后返回内容。您也可以使用php /path/to/get-website.php.
Method 2:Use Ajax with PHP (No phantomjs so won't run javascript);
方法二:Ajax 和 PHP 一起使用(没有 phantomjs 所以不会运行 javascript);
/get-website.php
/get-website.php
<?php
$html=file_get_contents('http://google.com');
echo $html;
?>
test.html
测试.html
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>on demo</title>
<style>
p {
color: red;
}
span {
color: blue;
}
</style>
<script src="https://code.jquery.com/jquery-1.10.2.js"></script>
</head>
<body>
<button id='click_me'>Click me</button>
<span style="display:none;"></span>
<script>
$( "#click_me" ).click(function () {
$.get("/get-website.php", function(data) {
var json = {
html: JSON.stringify(data),
delay: 1
};
alert(json.html);
});
});
</script>
</body>
</html>
回答by Adamantus
I found a fantastic page on this, it's an entire tutorial on how to process the DOM of a page in PHP which is entirely created using javascript.
我在这方面找到了一个很棒的页面,这是一个关于如何在 PHP 中处理页面 DOM 的完整教程,该页面完全使用 javascript 创建。
https://www.jacobward.co.uk/using-php-to-scrape-javascript-jquery-json-websites/"PhantomJS development is suspended until further notice" so that option isn't a good one.
https://www.jacobward.co.uk/using-php-to-scrape-javascript-jquery-json-websites/“PhantomJS 开发暂停,直至另行通知”,因此该选项不是一个好的选择。
回答by The E
All the PHP runs before the information is sent to the client. All the JavaScript runs after the information is sent to the client.
所有 PHP 在信息发送到客户端之前运行。在将信息发送到客户端后,所有 JavaScript 都会运行。
To do something with PHP after the page loads, the page will need to either
要在页面加载后使用 PHP 执行某些操作,页面需要执行以下任一操作
- reload, saving the JavaScript generated info in a cookie or as POST data (not ideal) OR
- make an Ajax call to another PHP file to get the data. (much better)
- 重新加载,将 JavaScript 生成的信息保存在 cookie 中或作为 POST 数据(不理想)或
- 对另一个 PHP 文件进行 Ajax 调用以获取数据。(好多了)
Since the data appears to be in different file than your PHP anyway, this is a pretty good solution. Since you tagged it jQuery, I assume you're using it.
由于数据似乎与您的 PHP 文件位于不同的文件中,因此这是一个非常好的解决方案。由于您将其标记为 jQuery,因此我假设您正在使用它。
jQuery has a set of pages about how it implements Ajax
But the easiest way to use jQuery for this is .post
但为此使用 jQuery 的最简单方法是.post
ex:
前任:
$.post( "http://example.com/myDataFile.txt", function( data ) {
//do more JavaScript stuff with the data you just retrieved
});
$.post(), as the name implies, can send data along with the request for the data file, so if that request is to, say, a PHP file, the PHP file can use that data.
$.post()顾名思义,可以将数据与对数据文件的请求一起发送,因此如果该请求是针对 PHP 文件的,则 PHP 文件可以使用该数据。
ex:
前任:
$.post( "http://example.com/myDataFile.txt",
{ foo: "bar"; yabba: "dabba" },
function( data ) {
//do more JavaScript stuff with the data you just retrieved
});
the data should be in JSON format in key/value pairs.
数据应该是 JSON 格式的键/值对。

