ajax 如何使 SPA SEO 可抓取?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18530258/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 11:05:35  来源:igfitidea点击:

How to make a SPA SEO crawlable?

ajaxseophantomjssingle-page-applicationdurandal

提问by beamish

I've been working on how to make a SPA crawlable by google based on google's instructions. Even though there are quite a few general explanations I couldn't find anywhere a more thorough step-by-step tutorial with actual examples. After having finished this I would like to share my solution so that others may also make use of it and possibly improve it further.
I am using MVCwith Webapicontrollers, and Phantomjson the server side, and Durandalon the client side with push-stateenabled; I also use Breezejsfor client-server data interaction, all of which I strongly recommend, but I'll try to give a general enough explanation that will also help people using other platforms.

我一直在研究如何根据 google 的说明使 SPA 可被 google 抓取。尽管有很多一般性解释,但我在任何地方都找不到带有实际示例的更详尽的分步教程。完成此操作后,我想分享我的解决方案,以便其他人也可以使用它并可能进一步改进它。
我使用MVCWebapi控制器和Phantomjs在服务器端,并迪朗达尔与客户端push-state启用; 我还使用Breezejs进行客户端-服务器数据交互,我强烈推荐所有这些,但我会尽量给出一个足够通用的解释,这也将帮助使用其他平台的人。

回答by beamish

Before starting, please make sure you understand what google requires, particularly the use of prettyand uglyURLs. Now lets see the implementation:

在开始之前,请确保您了解 google需要什么,特别是使用漂亮丑陋的URL。现在让我们看看实现:

Client Side

客户端

On the client side you only have a single html page which interacts with the server dynamically via AJAX calls. that's what SPA is about. All the atags in the client side are created dynamically in my application, we'll later see how to make these links visible to google's bot in the server. Each such atag needs to be able to have a pretty URLin the hreftag so that google's bot will crawl it. You don't want the hrefpart to be used when the client clicks on it (even though you do want the server to be able to parse it, we'll see that later), because we may not want a new page to load, only to make an AJAX call getting some data to be displayed in part of the page and change the URL via javascript (e.g. using HTML5 pushstateor with Durandaljs). So, we have both an hrefattribute for google as well as on onclickwhich does the job when the user clicks on the link. Now, since I use push-stateI don't want any #on the URL, so a typical atag may look like this:
<a href="http://www.xyz.com/#!/category/subCategory/product111" onClick="loadProduct('category','subCategory','product111')>see product111...</a>

'category' and 'subCategory' would probably be other phrases, such as 'communication' and 'phones' or 'computers' and 'laptops' for an electrical appliances store. Obviously there would be many different categories and sub categories. As you can see, the link is directly to the category, sub category and the product, not as extra-parameters to a specific 'store' page such as http://www.xyz.com/store/category/subCategory/product111. This is because I prefer shorter and simpler links. It implies that I there will not be a category with the same name as one of my 'pages', i.e. 'about'.
I will not go into how to load the data via AJAX (the onclickpart), search it on google, there are many good explanations. The only important thing here that I do want to mention is that when the user clicks on this link, I want the URL in the browser to look like this:
http://www.xyz.com/category/subCategory/product111. And this is URL is not sent to the server ! remember, this is a SPA where all the interaction between the client and the server is done via AJAX, no links at all! all 'pages' are implemented on the client side, and the different URL does not make a call to the server (the server does need to know how to handle these URLs in case they are used as external links from another site to your site, we'll see that later on the server side part). Now, this is handled wonderfully by Durandal. I strongly recommend it, but you can also skip this part if you prefer other technologies. If you do choose it, and you're also using MS Visual Studio Express 2012 for Web like me, you can install the Durandal Starter Kit, and there, in shell.js, use something like this:

在客户端,您只有一个 html 页面,它通过 AJAX 调用与服务器动态交互。这就是SPA的意义所在。a客户端的所有标签都是在我的应用程序中动态创建的,稍后我们将看到如何使这些链接对服务器中的谷歌机器人可见。每个这样的a标签都需要能够pretty URLhref标签中有一个,以便谷歌的机器人能够抓取它。您不希望在href客户端单击时使用该部分(即使您确实希望服务器能够解析它,我们稍后会看到),因为我们可能不希望加载新页面,只是为了让 AJAX 调用获取一些要显示在页面部分中的数据并通过 javascript 更改 URL(例如使用 HTML5pushstate或使用Durandaljs)。所以,我们有一个hrefgoogle 的属性以及onclick当用户单击链接时执行的工作。现在,由于我在 URL 上push-state不想要任何#内容,因此典型的a标签可能如下所示:
<a href="http://www.xyz.com/#!/category/subCategory/product111" onClick="loadProduct('category','subCategory','product111')>see product111...</a>

“category”和“subCategory”可能是其他短语,例如“communication”和“phones”或“computers”电器商店的“笔记本电脑”。显然会有许多不同的类别和子类别。如您所见,该链接直接指向类别、子类别和产品,而不是作为特定“商店”页面(例如http://www.xyz.com/store/category/subCategory/product111. 这是因为我更喜欢更短更简单的链接。这意味着我不会有与我的“页面”之一同名的类别,即“
如何通过AJAX加载数据(onclick部分)我就不讲了,google上搜索,有很多很好的解释。这里我要提到的唯一重要的事情是,当用户单击此链接时,我希望浏览器中的 URL 如下所示:
http://www.xyz.com/category/subCategory/product111. 而且这个 URL 不会发送到服务器!请记住,这是一个 SPA,客户端和服务器之间的所有交互都是通过 AJAX 完成的,根本没有链接!所有“页面”都在客户端实现,并且不同的 URL 不会调用服务器(服务器确实需要知道如何处理这些 URL,以防它们被用作从另一个站点到您站点的外部链接,我们稍后会在服务器端部分看到这一点)。现在,这被 Durandal 巧妙地处理了。我强烈推荐它,但如果你更喜欢其他技术,你也可以跳过这部分。如果您确实选择了它,并且您也像我一样使用 MS Visual Studio Express 2012 for Web,则可以安装Durandal Starter Kit,然后在shell.js中使用如下内容:

define(['plugins/router', 'durandal/app'], function (router, app) {
    return {
        router: router,
        activate: function () {
            router.map([
                { route: '', title: 'Store', moduleId: 'viewmodels/store', nav: true },
                { route: 'about', moduleId: 'viewmodels/about', nav: true }
            ])
                .buildNavigationModel()
                .mapUnknownRoutes(function (instruction) {
                    instruction.config.moduleId = 'viewmodels/store';
                    instruction.fragment = instruction.fragment.replace("!/", ""); // for pretty-URLs, '#' already removed because of push-state, only ! remains
                    return instruction;
                });
            return router.activate({ pushState: true });
        }
    };
});

There are a few important things to notice here:

这里有一些重要的事情需要注意:

  1. The first route (with route:'') is for the URL which has no extra data in it, i.e. http://www.xyz.com. In this page you load general data using AJAX. There may actually be no atags at all in this page. You will want to add the following tag so that google's bot will know what to do with it:
    <meta name="fragment" content="!">. This tag will make google's bot transform the URL to www.xyz.com?_escaped_fragment_=which we'll see later.
  2. The 'about' route is just an example to a link to other 'pages' you may want on your web application.
  3. Now, the tricky part is that there is no 'category' route, and there may be many different categories - none of which have a predefined route. This is where mapUnknownRoutescomes in. It maps these unknown routes to the 'store' route and also removes any '!' from the URL in case it's a pretty URLgenerated by google's seach engine. The 'store' route takes the info in the 'fragment' property and makes the AJAX call to get the data, display it, and change the URL locally. In my application, I don't load a different page for every such call; I only change the part of the page where this data is relevant and also change the URL locally.
  4. Notice the pushState:truewhich instructs Durandal to use push state URLs.
  1. 第一个路由(带route:'')用于其中没有额外数据的 URL,即http://www.xyz.com. 在此页面中,您使用 AJAX 加载一般数据。a此页面中实际上可能根本没有标签。您需要添加以下标签,以便 google 的机器人知道如何处理它:
    <meta name="fragment" content="!">. 这个标签将使谷歌的机器人转换www.xyz.com?_escaped_fragment_=我们稍后会看到的 URL 。
  2. 'about' 路由只是一个示例,链接到您的 Web 应用程序中可能需要的其他“页面”。
  3. 现在,棘手的部分是没有“类别”路线,而且可能有许多不同的类别——没有一个有预定义的路线。这就是mapUnknownRoutes进来的地方。它将这些未知路由映射到“商店”路由,并删除任何“!” 来自 URL,以防它是pretty URL由 google 的搜索引擎生成的。'store' 路由获取 'fragment' 属性中的信息,并进行 AJAX 调用以获取数据、显示数据并在本地更改 URL。在我的应用程序中,我不会为每个这样的调用加载不同的页面;我只更改与此数据相关的页面部分,并在本地更改 URL。
  4. 请注意pushState:true指示 Durandal 使用推送状态 URL 的 。

This is all we need in the client side. It can be implemented also with hashed URLs (in Durandal you simple remove the pushState:truefor that). The more complex part (at least for me...) was the server part:

这就是我们在客户端所需要的。它也可以使用散列 URL 来实现(在 Durandal 中,您可以简单地删除pushState:true它)。更复杂的部分(至少对我来说......)是服务器部分:

Server Side

服务器端

I'm using MVC 4.5on the server side with WebAPIcontrollers. The server actually needs to handle 3 types of URLs: the ones generated by google - both prettyand uglyand also a 'simple' URL with the same format as the one that appears in the client's browser. Lets look on how to do this:

MVC 4.5在服务器端使用WebAPI控制器。服务器实际上需要处理3种类型的网址:通过谷歌产生的-既prettyugly,也是一个“简单”的URL格式同为一个出现在客户端的浏览器。让我们看看如何做到这一点:

Pretty URLs and 'simple' ones are first interpreted by the server as if trying to reference a non-existent controller. The server sees something like http://www.xyz.com/category/subCategory/product111and looks for a controller named 'category'. So in web.configI add the following line to redirect these to a specific error handling controller:

漂亮的 URL 和“简单”的 URL 首先被服务器解释为试图引用一个不存在的控制器。服务器看到类似的东西http://www.xyz.com/category/subCategory/product111并寻找名为“category”的控制器。所以在web.config我添加以下行以将它们重定向到特定的错误处理控制器:

<customErrors mode="On" defaultRedirect="Error">
    <error statusCode="404" redirect="Error" />
</customErrors><br/>

Now, this transforms the URL to something like: http://www.xyz.com/Error?aspxerrorpath=/category/subCategory/product111. I want the URL to be sent to the client that will load the data via AJAX, so the trick here is to call the default 'index' controller as if not referencing any controller; I do that by addinga hash to the URL before all the 'category' and 'subCategory' parameters; the hashed URL does not require any special controller except the default 'index' controller and the data is sent to the client which then removes the hash and uses the info after the hash to load the data via AJAX. Here is the error handler controller code:

现在,这会将 URL 转换为类似:http://www.xyz.com/Error?aspxerrorpath=/category/subCategory/product111。我希望将 URL 发送到将通过 AJAX 加载数据的客户端,所以这里的技巧是调用默认的“索引”控制器,就像不引用任何控制器一样;我通过在所有 'category' 和 'subCategory' 参数之前向 URL添加一个散列来做到这一点;除了默认的“索引”控制器之外,散列的 URL 不需要任何特殊的控制器,数据被发送到客户端,然后删除散列并使用散列后的信息通过 AJAX 加载数据。这是错误处理程序控制器代码:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Web.Http;

using System.Web.Routing;

namespace eShop.Controllers
{
    public class ErrorController : ApiController
    {
        [HttpGet, HttpPost, HttpPut, HttpDelete, HttpHead, HttpOptions, AcceptVerbs("PATCH"), AllowAnonymous]
        public HttpResponseMessage Handle404()
        {
            string [] parts = Request.RequestUri.OriginalString.Split(new[] { '?' }, StringSplitOptions.RemoveEmptyEntries);
            string parameters = parts[ 1 ].Replace("aspxerrorpath=","");
            var response = Request.CreateResponse(HttpStatusCode.Redirect);
            response.Headers.Location = new Uri(parts[0].Replace("Error","") + string.Format("#{0}", parameters));
            return response;
        }
    }
}


But what about the Ugly URLs? These are created by google's bot and should return plain HTML that contains all the data the user sees in the browser. For this I use phantomjs. Phantom is a headless browser doing what the browser is doing on the client side - but on the server side. In other words, phantom knows (among other things) how to get a web page via a URL, parse it including running all the javascript code in it (as well as getting data via AJAX calls), and give you back the HTML that reflects the DOM. If you're using MS Visual Studio Express you many want to install phantom via this link.
But first, when an ugly URL is sent to the server, we must catch it; For this, I added to the 'App_start' folder the following file:


但是丑陋的 URL呢?这些是由 google 的机器人创建的,应该返回包含用户在浏览器中看到的所有数据的纯 HTML。为此,我使用phantomjs。Phantom 是一个无头浏览器,它做浏览器在客户端所做的事情——但在服务器端。换句话说,phantom 知道(除其他外)如何通过 URL 获取网页,对其进行解析,包括运行其中的所有 javascript 代码(以及通过 AJAX 调用获取数据),并将反映的 HTML 返回给您DOM。如果您使用的是 MS Visual Studio Express,您可能希望通过此链接安装 phantom 。
但首先,当一个丑陋的 URL 被发送到服务器时,我们必须抓住它;为此,我将以下文件添加到“App_start”文件夹中:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Web;
using System.Web.Mvc;
using System.Web.Routing;

namespace eShop.App_Start
{
    public class AjaxCrawlableAttribute : ActionFilterAttribute
    {
        private const string Fragment = "_escaped_fragment_";

        public override void OnActionExecuting(ActionExecutingContext filterContext)
        {
            var request = filterContext.RequestContext.HttpContext.Request;

            if (request.QueryString[Fragment] != null)
            {

                var url = request.Url.ToString().Replace("?_escaped_fragment_=", "#");

                filterContext.Result = new RedirectToRouteResult(
                    new RouteValueDictionary { { "controller", "HtmlSnapshot" }, { "action", "returnHTML" }, { "url", url } });
            }
            return;
        }
    }
}

This is called from 'filterConfig.cs' also in 'App_start':

这也是从“App_start”中的“filterConfig.cs”调用的:

using System.Web.Mvc;
using eShop.App_Start;

namespace eShop
{
    public class FilterConfig
    {
        public static void RegisterGlobalFilters(GlobalFilterCollection filters)
        {
            filters.Add(new HandleErrorAttribute());
            filters.Add(new AjaxCrawlableAttribute());
        }
    }
}

As you can see, 'AjaxCrawlableAttribute' routes ugly URLs to a controller named 'HtmlSnapshot', and here is this controller:

如您所见,“AjaxCrawlableAttribute”将丑陋的 URL 路由到名为“HtmlSnapshot”的控制器,这是这个控制器:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Web;
using System.Web.Mvc;

namespace eShop.Controllers
{
    public class HtmlSnapshotController : Controller
    {
        public ActionResult returnHTML(string url)
        {
            string appRoot = Path.GetDirectoryName(AppDomain.CurrentDomain.BaseDirectory);

            var startInfo = new ProcessStartInfo
            {
                Arguments = String.Format("{0} {1}", Path.Combine(appRoot, "seo\createSnapshot.js"), url),
                FileName = Path.Combine(appRoot, "bin\phantomjs.exe"),
                UseShellExecute = false,
                CreateNoWindow = true,
                RedirectStandardOutput = true,
                RedirectStandardError = true,
                RedirectStandardInput = true,
                StandardOutputEncoding = System.Text.Encoding.UTF8
            };
            var p = new Process();
            p.StartInfo = startInfo;
            p.Start();
            string output = p.StandardOutput.ReadToEnd();
            p.WaitForExit();
            ViewData["result"] = output;
            return View();
        }

    }
}

The associated viewis very simple, just one line of code:
@Html.Raw( ViewBag.result )
As you can see in the controller, phantom loads a javascript file named createSnapshot.jsunder a folder I created called seo. Here is this javascript file:

关联view非常简单,只有一行代码:
@Html.Raw( ViewBag.result )
正如您在控制器中看到的,phantom 加载了一个 javascript 文件createSnapshot.js,该文件位于我创建的名为seo. 这是这个 javascript 文件:

var page = require('webpage').create();
var system = require('system');

var lastReceived = new Date().getTime();
var requestCount = 0;
var responseCount = 0;
var requestIds = [];
var startTime = new Date().getTime();

page.onResourceReceived = function (response) {
    if (requestIds.indexOf(response.id) !== -1) {
        lastReceived = new Date().getTime();
        responseCount++;
        requestIds[requestIds.indexOf(response.id)] = null;
    }
};
page.onResourceRequested = function (request) {
    if (requestIds.indexOf(request.id) === -1) {
        requestIds.push(request.id);
        requestCount++;
    }
};

function checkLoaded() {
    return page.evaluate(function () {
        return document.all["compositionComplete"];
    }) != null;
}
// Open the page
page.open(system.args[1], function () { });

var checkComplete = function () {
    // We don't allow it to take longer than 5 seconds but
    // don't return until all requests are finished
    if ((new Date().getTime() - lastReceived > 300 && requestCount === responseCount) || new Date().getTime() - startTime > 10000 || checkLoaded()) {
        clearInterval(checkCompleteInterval);
        var result = page.content;
        //result = result.substring(0, 10000);
        console.log(result);
        //console.log(results);
        phantom.exit();
    }
}
// Let us check to see if the page is finished rendering
var checkCompleteInterval = setInterval(checkComplete, 300);

I first want to thank Thomas Davisfor the page where I got the basic code from :-).
You will notice something odd here: phantom keeps re-loading the page until the checkLoaded()function returns true. Why is that? this is because my specific SPA makes several AJAX call to get all the data and place it in the DOM on my page, and phantom cannot know when all the calls have completed before returning me back the HTML reflection of the DOM. What I did here is after the final AJAX call I add a <span id='compositionComplete'></span>, so that if this tag exists I know the DOM is completed. I do this in response to Durandal's compositionCompleteevent, see herefor more. If this does not happen withing 10 seconds I give up (it should take only a second to so the most). The HTML returned contains all the links that the user sees in the browser. The script will not work properly because the <script>tags that do exist in the HTML snapshot do not reference the right URL. This can be changed too in the javascript phantom file, but I don't think this is necassary because the HTML snapshort is only used by google to get the alinks and not to run javascript; these links doreference a pretty URL, and if fact, if you try to see the HTML snapshot in a browser, you will get javascript errors but all the links will work properly and direct you to the server once again with a pretty URL this time getting the fully working page.
This is it. Now the server know how to handle both pretty and ugly URLs, with push-state enabled on both server and client. All ugly URLs are treated the same way using phantom so there's no need to create a separate controller for each type of call.
One thing you might prefer to change is not to make a general 'category/subCategory/product' call but to add a 'store' so that the link will look something like: http://www.xyz.com/store/category/subCategory/product111. This will avoid the problem in my solution that all invalid URLs are treated as if they are actually calls to the 'index' controller, and I suppose that these can be handled then within the 'store' controller without the addition to the web.configI showed above.

我首先要感谢Thomas Davis提供我从 :-) 获得基本代码的页面。
您会注意到这里有些奇怪:phantom 不断重新加载页面,直到checkLoaded()函数返回 true。这是为什么?这是因为我的特定 SPA 进行了几次 AJAX 调用以获取所有数据并将其放置在我页面上的 DOM 中,并且在将 DOM 的 HTML 反射返回给我之前,幻影无法知道所有调用何时完成。我在这里做的是在最后的 AJAX 调用之后添加一个<span id='compositionComplete'></span>,这样如果这个标签存在,我就知道 DOM 已经完成。我这样做是为了响应 Durandal 的compositionComplete事件,请参阅此处更多。如果这在 10 秒内没有发生,我放弃(最多只需要一秒钟)。返回的 HTML 包含用户在浏览器中看到的所有链接。该脚本将无法正常工作,因为<script>HTML 快照中确实存在的标记没有引用正确的 URL。这也可以在 javascript 幻像文件中更改,但我认为这不是必要的,因为 HTML snapshort 仅由 google 用于获取a链接而不是运行 javascript;这些链接确实引用了一个漂亮的 URL,事实上,如果您尝试在浏览器中查看 HTML 快照,您将收到 javascript 错误,但所有链接都可以正常工作,并且这次使用漂亮的 URL 再次将您引导至服务器获得完全工作的页面。
就是这个。现在服务器知道如何处理漂亮和丑陋的 URL,同时在服务器和客户端上启用推送状态。使用 phantom 以相同的方式处理所有丑陋的 URL,因此无需为每种类型的调用创建单独的控制器。
您可能更喜欢更改的一件事不是进行一般的“类别/子类别/产品”调用,而是添加一个“商店”,以便链接看起来像:http://www.xyz.com/store/category/subCategory/product111。这将避免我的解决方案中的问题,即所有无效的 URL 都被视为实际上是对“索引”控制器的调用,我想这些可以在“商店”控制器中处理,而无需添加web.config我上面显示的.

回答by Edward Olamisan

Google is now able to render SPA pages: Deprecating our AJAX crawling scheme

Google 现在能够呈现 SPA 页面: 弃用我们的 AJAX 抓取方案

回答by Joachim H. Skeie

Here is a link to a screencast-recording from my Ember.js Training class I hosted in London on August 14th. It outlines a strategy for both your client-side application and for you server-side application, as well as gives a live demonstration of how implementing these features will provide your JavaScript Single-Page-App with graceful degradation even for users with JavaScript turned off.

这是我 8 月 14 日在伦敦举办的 Ember.js 培训课程的截屏视频链接。它为您的客户端应用程序和您的服务器端应用程序概述了一个策略,并提供了一个现场演示,说明即使对于关闭 JavaScript 的用户,实现这些功能将如何为您的 JavaScript 单页应用程序提供优雅的降级.

It uses PhantomJS to aid in crawling your website.

它使用 PhantomJS 来帮助抓取您的网站。

In short, the steps required are:

简而言之,所需的步骤是:

  • Have a hosted version of the web application you want to crawl, this site needs to have ALL of the data you have in production
  • Write a JavaScript application (PhantomJS Script) to load your website
  • Add index.html ( or “/“ ) to the list of URLs to crawl
    • Pop the first URL added to the crawl-list
    • Load page and render its DOM
    • Find any links on the loaded page that links to your own site (URL filtering)
    • Add this link to a list of “crawlable” URLS, if its not already crawled
    • Store the rendered DOM to a file on the file system, but strip away ALL script-tags first
    • At the end, create a Sitemap.xml file with the crawled URLs
  • 拥有您要抓取的 Web 应用程序的托管版本,此站点需要拥有您在生产中拥有的所有数据
  • 编写一个 JavaScript 应用程序(PhantomJS 脚本)来加载您的网站
  • 将 index.html(或“/”)添加到要抓取的 URL 列表中
    • 弹出添加到爬网列表的第一个 URL
    • 加载页面并渲染其 DOM
    • 在加载的页面上查找链接到您自己站点的任何链接(URL 过滤)
    • 将此链接添加到“可抓取”网址列表中(如果尚未抓取)
    • 将渲染的 DOM 存储到文件系统上的文件中,但首先去除所有脚本标签
    • 最后,使用抓取的 URL 创建一个 Sitemap.xml 文件

Once this step is done, its up to your backend to serve the static-version of your HTML as part of the noscript-tag on that page. This will allow Google and other search engines to crawl every single page on your website, even though your app originally is a single-page-app.

完成此步骤后,您的后端将作为该页面上的 noscript-tag 的一部分提供 HTML 的静态版本。这将允许 Google 和其他搜索引擎抓取您网站上的每个页面,即使您的应用程序最初是一个单页面应用程序。

Link to the screencast with the full details:

链接到包含完整详细信息的截屏视频:

http://www.devcasts.io/p/spas-phantomjs-and-seo/#

http://www.devcasts.io/p/spas-phantomjs-and-seo/#

回答by gabrielperales

You can use or create your own service for prerender your SPA with the service called prerender. You can check it out on his website prerender.ioand on his github project(It uses PhantomJS and it renderize your website for you).

您可以使用或创建您自己的服务,通过名为 prerender 的服务来预渲染您的 SPA。您可以在他的网站prerender.io和他的github 项目(它使用 PhantomJS 并为您呈现您的网站)上查看它。

It's very easy to start with. You only have to redirect crawlers requests to the service and they will receive the rendered html.

这很容易开始。您只需将爬虫请求重定向到服务,它们就会收到呈现的 html。

回答by ddtxra

You can use http://sparender.com/which enables Single Page Applications to be crawled correctly.

您可以使用http://sparender.com/来正确抓取单页应用程序。