Html 搜索引擎如何处理 AngularJS 应用程序?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13499040/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do search engines deal with AngularJS applications?
提问by luisfarzati
I see two issues with AngularJS application regarding search engines and SEO:
我看到 AngularJS 应用程序有两个关于搜索引擎和 SEO 的问题:
1) What happens with custom tags? Do search engines ignore the whole content within those tags? i.e. suppose I have
1) 自定义标签会发生什么?搜索引擎是否会忽略这些标签中的全部内容?即假设我有
<custom>
<h1>Hey, this title is important</h1>
</custom>
would <h1>
be indexed despite being inside custom tags?
<h1>
尽管在自定义标签内,也会被索引吗?
2) Is there a way to avoid search engines of indexing {{}} binds literally? i.e.
2) 有没有办法避免搜索引擎索引 {{}} 字面绑定?IE
<h2>{{title}}</h2>
I know I could do something like
我知道我可以做类似的事情
<h2 ng-bind="title"></h2>
but what if I want to actually let the crawler "see" the title? Is server-side rendering the only solution?
但是如果我真的想让爬虫“看到”标题呢?服务器端渲染是唯一的解决方案吗?
采纳答案by joakimbl
Update May 2014
2014 年 5 月更新
Google crawlers now executes javascript- you can use the Google Webmaster Toolsto better understand how your sites are rendered by Google.
Google 爬虫现在执行 javascript- 您可以使用Google 网站管理员工具更好地了解Google如何呈现您的网站。
Original answer
If you want to optimize your app for search engines there is unfortunately no way around serving a pre-rendered version to the crawler. You can read more about Google's recommendations for ajax and javascript-heavy sites here.
原始答案
如果您想针对搜索引擎优化您的应用程序,很遗憾没有办法为爬虫提供预渲染版本。您可以在此处阅读有关 Google 对 ajax 和 javascript 重站点的建议的更多信息。
If this is an option I'd recommend reading this articleabout how to do SEO for Angular with server-side rendering.
如果这是一个选项,我建议您阅读这篇关于如何使用服务器端渲染对 Angular 进行 SEO 的文章。
I'm not sure what the crawler does when it encounters custom tags.
我不确定爬虫遇到自定义标签时会做什么。
回答by superluminary
Use PushState and Precomposition
使用 PushState 和预组合
The current (2015) way to do this is using the JavaScript pushState method.
当前(2015 年)执行此操作的方法是使用 JavaScript pushState 方法。
PushState changes the URL in the top browser bar without reloading the page. Say you have a page containing tabs. The tabs hide and show content, and the content is inserted dynamically, either using AJAX or by simply setting display:none and display:block to hide and show the correct tab content.
PushState 更改顶部浏览器栏中的 URL,而无需重新加载页面。假设您有一个包含选项卡的页面。选项卡隐藏和显示内容,内容是动态插入的,可以使用 AJAX 或简单地设置 display:none 和 display:block 来隐藏和显示正确的选项卡内容。
When the tabs are clicked, use pushState to update the url in the address bar. When the page is rendered, use the value in the address bar to determine which tab to show. Angular routing will do this for you automatically.
单击选项卡时,使用 pushState 更新地址栏中的 url。呈现页面时,使用地址栏中的值来确定要显示的选项卡。Angular 路由会自动为你做这件事。
Precomposition
预合成
There are two ways to hit a PushState Single Page App (SPA)
有两种方法可以访问 PushState 单页应用程序 (SPA)
- Via PushState, where the user clicks a PushState link and the content is AJAXed in.
- By hitting the URL directly.
- 通过 PushState,用户点击 PushState 链接,内容被 AJAX 输入。
- 通过直接点击网址。
The initial hit on the site will involve hitting the URL directly. Subsequent hits will simply AJAX in content as the PushState updates the URL.
网站上的初始点击将涉及直接点击 URL。随着 PushState 更新 URL,后续的点击将在内容中简单地 AJAX。
Crawlers harvest links from a page then add them to a queue for later processing. This means that for a crawler, every hit on the server is a direct hit, they don't navigate via Pushstate.
爬虫从页面中获取链接,然后将它们添加到队列中以供以后处理。这意味着对于爬虫来说,服务器上的每次点击都是直接点击,它们不通过 Pushstate 导航。
Precomposition bundles the initial payload into the first response from the server, possibly as a JSON object. This allows the Search Engine to render the page without executing the AJAX call.
预组合将初始负载捆绑到来自服务器的第一个响应中,可能作为 JSON 对象。这允许搜索引擎在不执行 AJAX 调用的情况下呈现页面。
There is some evidence to suggest that Google might not execute AJAX requests. More on this here:
有一些证据表明 Google 可能不会执行 AJAX 请求。更多关于这里:
Search Engines can read and execute JavaScript
搜索引擎可以读取和执行 JavaScript
Google has been able to parse JavaScript for some time now, it's why they originally developed Chrome, to act as a full featured headless browser for the Google spider. If a link has a valid href attribute, the new URL can be indexed. There's nothing more to do.
谷歌已经能够解析 JavaScript 一段时间了,这就是他们最初开发 Chrome 的原因,作为谷歌蜘蛛的全功能无头浏览器。如果链接具有有效的 href 属性,则可以将新 URL 编入索引。没什么可做的了。
If clicking a link in addition triggers a pushState call, the site can be navigated by the user via PushState.
如果单击链接还触发 pushState 调用,则用户可以通过 PushState 导航该站点。
Search Engine Support for PushState URLs
搜索引擎对 PushState URL 的支持
PushState is currently supported by Google and Bing.
PushState 目前受 Google 和 Bing 支持。
谷歌
Here's Matt Cutts responding to Paul Irish's question about PushState for SEO:
这是 Matt Cutts 对 Paul Irish 关于 PushState for SEO 的问题的回应:
Here is Google announcing full JavaScript support for the spider:
这是谷歌宣布对蜘蛛的全面 JavaScript 支持:
http://googlewebmastercentral.blogspot.de/2014/05/understanding-web-pages-better.html
http://googlewebmastercentral.blogspot.de/2014/05/understanding-web-pages-better.html
The upshot is that Google supports PushState and will index PushState URLs.
结果是 Google 支持 PushState 并将索引 PushState URL。
See also Google webmaster tools' fetch as Googlebot. You will see your JavaScript (including Angular) is executed.
另请参阅 Google 网站管理员工具的抓取方式为 Googlebot。您将看到您的 JavaScript(包括 Angular)被执行。
Bing
必应
Here is Bing's announcement of support for pretty PushState URLs dated March 2013:
以下是 Bing 于 2013 年 3 月发布的对漂亮 PushState URL 支持的公告:
http://blogs.bing.com/webmaster/2013/03/21/search-engine-optimization-best-practices-for-ajax-urls/
http://blogs.bing.com/webmaster/2013/03/21/search-engine-optimization-best-practices-for-ajax-urls/
Don't use HashBangs #!
不要使用 HashBangs #!
Hashbang urls were an ugly stopgap requiring the developer to provide a pre-rendered version of the site at a special location. They still work, but you don't need to use them.
Hashbang 网址是一个丑陋的权宜之计,需要开发人员在特殊位置提供网站的预渲染版本。它们仍然有效,但您不需要使用它们。
Hashbang URLs look like this:
Hashbang URL 如下所示:
domain.com/#!path/to/resource
domain.com/#!path/to/resource
This would be paired with a metatag like this:
这将与这样的元标记配对:
<meta name="fragment" content="!">
<meta name="fragment" content="!">
Google will not index them in this form, but will instead pull a static version of the site from the _escaped_fragments_ URL and index that.
Google 不会以这种形式将它们编入索引,而是从 _escaped_fragments_ URL 中提取站点的静态版本并将其编入索引。
Pushstate URLs look like any ordinary URL:
Pushstate URL 看起来像任何普通的 URL:
domain.com/path/to/resource
domain.com/path/to/resource
The difference is that Angular handles them for you by intercepting the change to document.location dealing with it in JavaScript.
不同之处在于 Angular 通过拦截在 JavaScript 中处理它的 document.location 的更改来为您处理它们。
If you want to use PushState URLs (and you probably do) take out all the old hash style URLs and metatags and simply enable HTML5 mode in your config block.
如果您想使用 PushState URL(并且您可能会这样做),请删除所有旧的哈希样式 URL 和元标记,只需在您的配置块中启用 HTML5 模式。
Testing your site
测试您的网站
Google Webmaster tools now contains a tool which will allow you to fetch a URL as google, and render JavaScript as Google renders it.
Google 网站管理员工具现在包含一个工具,它允许您以 google 的形式获取 URL,并在 Google 呈现它时呈现 JavaScript。
https://www.google.com/webmasters/tools/googlebot-fetch
https://www.google.com/webmasters/tools/googlebot-fetch
Generating PushState URLs in Angular
在 Angular 中生成 PushState URL
To generate real URLs in Angular, rather than # prefixed ones, set HTML5 mode on your $locationProvider object.
要在 Angular 中生成真实的 URL,而不是 # 前缀,请在 $locationProvider 对象上设置 HTML5 模式。
$locationProvider.html5Mode(true);
Server Side
服务器端
Since you are using real URLs, you will need to ensure the same template (plus some precomposed content) gets shipped by your server for all valid URLs. How you do this will vary depending on your server architecture.
由于您使用的是真实 URL,因此您需要确保服务器为所有有效 URL 提供相同的模板(加上一些预先组合的内容)。您如何执行此操作将因您的服务器架构而异。
Sitemap
网站地图
Your app may use unusual forms of navigation, for example hover or scroll. To ensure Google is able to drive your app, I would probably suggest creating a sitemap, a simple list of all the urls your app responds to. You can place this at the default location (/sitemap or /sitemap.xml), or tell Google about it using webmaster tools.
您的应用可能会使用不寻常的导航形式,例如悬停或滚动。为了确保 Google 能够驱动您的应用程序,我可能会建议创建一个站点地图,即您的应用程序响应的所有 URL 的简单列表。您可以将其放置在默认位置(/sitemap 或 /sitemap.xml),或使用网站管理员工具告诉 Google。
It's a good idea to have a sitemap anyway.
无论如何,拥有站点地图是个好主意。
Browser support
浏览器支持
Pushstate works in IE10. In older browsers, Angular will automatically fall back to hash style URLs
Pushstate 适用于 IE10。在较旧的浏览器中,Angular 将自动回退到哈希样式的 URL
A demo page
演示页面
The following content is rendered using a pushstate URL with precomposition:
以下内容是使用带有预组合的 pushstate URL 呈现的:
http://html5.gingerhost.com/london
http://html5.gingerhost.com/london
As can be verified, at this link, the content is indexed and is appearing in Google.
可以验证,在此链接中,内容已编入索引并出现在 Google 中。
Serving 404 and 301 Header status codes
提供 404 和 301 标头状态代码
Because the search engine will always hit your server for every request, you can serve header status codes from your server and expect Google to see them.
因为搜索引擎总是会针对每个请求访问您的服务器,所以您可以从您的服务器提供标头状态代码,并期望 Google 看到它们。
回答by auser
Let's get definitive about AngularJS and SEO
让我们对 AngularJS 和 SEO 有明确的认识
Google, Yahoo, Bing, and other search engines crawl the web in traditional ways using traditional crawlers. They run robotsthat crawl the HTML on web pages, collecting information along the way. They keep interesting words and look for other links to other pages (these links, the amount of them and the number of them come into play with SEO).
Google、Yahoo、Bing 和其他搜索引擎使用传统爬虫以传统方式爬网。他们运行机器人来抓取网页上的 HTML,沿途收集信息。他们保留有趣的词并寻找指向其他页面的其他链接(这些链接、它们的数量和它们的数量与 SEO 相关)。
So why don't search engines deal with javascript sites?
那么为什么搜索引擎不处理 javascript 站点呢?
The answer has to do with the fact that the search engine robots work through headless browsers and they most often do nothave a javascript rendering engine to render the javascript of a page. This works for most pages as most static pages don't care about JavaScript rendering their page, as their content is already available.
答案与搜索引擎机器人通过无头浏览器工作的事实有关,而且他们通常没有javascript 渲染引擎来渲染页面的 javascript。这适用于大多数页面,因为大多数静态页面不关心 JavaScript 呈现其页面,因为它们的内容已经可用。
What can be done about it?
可以做些什么呢?
Luckily, crawlers of the larger sites have started to implement a mechanism that allows us to make our JavaScript sites crawlable, but it requires us to implement a change to our site.
幸运的是,大型站点的爬虫已经开始实施一种机制,使我们可以使 JavaScript 站点可爬行,但这需要我们对站点进行更改。
If we change our hashPrefix
to be #!
instead of simply #
, then modern search engines will change the request to use _escaped_fragment_
instead of #!
. (With HTML5 mode, i.e. where we have links without the hash prefix, we can implement this same feature by looking at the User Agent
header in our backend).
如果我们将 ourhashPrefix
改为 is#!
而不是简单地#
,那么现代搜索引擎会将请求改为使用_escaped_fragment_
而不是#!
. (在 HTML5 模式下,即我们有没有哈希前缀的链接,我们可以通过查看User Agent
后端中的标头来实现相同的功能)。
That is to say, instead of a request from a normal browser that looks like:
也就是说,不是来自普通浏览器的请求,它看起来像:
http://www.ng-newsletter.com/#!/signup/page
http://www.ng-newsletter.com/#!/signup/page
A search engine will search the page with:
搜索引擎将使用以下内容搜索页面:
http://www.ng-newsletter.com/?_escaped_fragment_=/signup/page
http://www.ng-newsletter.com/?_escaped_fragment_=/signup/page
We can set the hash prefix of our Angular apps using a built-in method from ngRoute
:
我们可以使用以下内置方法设置 Angular 应用程序的哈希前缀ngRoute
:
angular.module('myApp', [])
.config(['$location', function($location) {
$location.hashPrefix('!');
}]);
And, if we're using html5Mode
, we will need to implement this using the meta tag:
而且,如果我们正在使用html5Mode
,我们将需要使用元标记来实现它:
<meta name="fragment" content="!">
Reminder, we can set the html5Mode()
with the $location
service:
提醒,我们可以html5Mode()
用$location
服务设置:
angular.module('myApp', [])
.config(['$location',
function($location) {
$location.html5Mode(true);
}]);
Handling the search engine
处理搜索引擎
We have a lot of opportunities to determine how we'll deal with actually delivering content to search engines as static HTML. We can host a backend ourselves, we can use a service to host a back-end for us, we can use a proxy to deliver the content, etc. Let's look at a few options:
我们有很多机会来确定我们将如何实际处理以静态 HTML 形式向搜索引擎提供内容。我们可以自己托管后端,我们可以使用服务为我们托管后端,我们可以使用代理来交付内容等。让我们看看几个选项:
Self-hosted
自托管
We can write a service to handle dealing with crawling our own site using a headless browser, like phantomjs or zombiejs, taking a snapshot of the page with rendered data and storing it as HTML. Whenever we see the query string ?_escaped_fragment_
in a search request, we can deliver the static HTML snapshot we took of the page instead of the pre-rendered page through only JS. This requires us to have a backend that delivers our pages with conditional logic in the middle. We can use something like prerender.io'sbackend as a starting point to run this ourselves. Of course, we still need to handle the proxying and the snippet handling, but it's a good start.
我们可以编写一个服务来处理使用无头浏览器(如 phantomjs 或zombiejs)抓取我们自己的网站的处理,使用渲染数据获取页面快照并将其存储为 HTML。每当我们?_escaped_fragment_
在搜索请求中看到查询字符串时,我们就可以只通过 JS 传递我们为页面拍摄的静态 HTML 快照,而不是预渲染的页面。这要求我们有一个后端,可以在中间提供带有条件逻辑的页面。我们可以使用类似prerender.io 的后端作为起点来自己运行它。当然,我们仍然需要处理代理和代码段处理,但这是一个好的开始。
With a paid service
有偿服务
The easiest and the fastest way to get content into search engine is to use a service Brombone, seo.js, seo4ajax, and prerender.ioare good examples of these that will host the above content rendering for you. This is a good option for the times when we don't want to deal with running a server/proxy. Also, it's usually super quick.
将内容输入搜索引擎的最简单、最快的方法是使用服务Bromone、seo.js、seo4ajax和prerender.io就是很好的例子,它们将为您托管上述内容呈现。当我们不想处理运行服务器/代理的时候,这是一个不错的选择。此外,它通常非常快。
For more information about Angular and SEO, we wrote an extensive tutorial on it at http://www.ng-newsletter.com/posts/serious-angular-seo.htmlandwe detailed it even more in our book ng-book: The Complete Book on AngularJS. Check it out at ng-book.com.
有关 Angular 和 SEO 的更多信息,我们在http://www.ng-newsletter.com/posts/serious-angular-seo.html上写了一篇详尽的教程,我们在ng-book 一书中对其进行了更详细的介绍: AngularJS 全书。在ng-book.com 上查看。
回答by Brad Green
You should really check out the tutorial on building an SEO-friendly AngularJS site on the year of moo blog. He walks you through all the steps outlined on Angular's documentation. http://www.yearofmoo.com/2012/11/angularjs-and-seo.html
在 moo 博客年,您真的应该查看有关构建 SEO 友好的 AngularJS 站点的教程。他将引导您完成 Angular 文档中概述的所有步骤。http://www.yearofmoo.com/2012/11/angularjs-and-seo.html
Using this technique, the search engine sees the expanded HTML instead of the custom tags.
使用这种技术,搜索引擎会看到扩展的 HTML,而不是自定义标签。
回答by user3330270
This has drastically changed.
这已经发生了翻天覆地的变化。
If you use: $locationProvider.html5Mode(true); you are set.
如果您使用: $locationProvider.html5Mode(true); 你准备好了。
No more rendering pages.
不再渲染页面。
回答by Ketan
Things have changed quite a bit since this question was asked. There are now options to let Google index your AngularJS site. The easiest option I found was to use http://prerender.iofree service that will generate the crwalable pages for you and serve that to the search engines. It is supported on almost all server side web platforms. I have recently started using them and the support is excellent too.
自从提出这个问题以来,情况发生了很大变化。现在可以选择让 Google 为您的 AngularJS 站点编制索引。我发现的最简单的选择是使用http://prerender.io免费服务,该服务将为您生成可识别的页面并将其提供给搜索引擎。几乎所有服务器端 Web 平台都支持它。我最近开始使用它们,支持也很好。
I do not have any affiliation with them, this is coming from a happy user.
我与他们没有任何从属关系,这是来自一个快乐的用户。
回答by Kevin C.
Angular's own website serves simplified content to search engines: http://docs.angularjs.org/?_escaped_fragment_=/tutorial/step_09
Angular 自己的网站为搜索引擎提供简化的内容:http: //docs.angularjs.org/?_escaped_fragment_=/ tutorial/step_09
Say your Angular app is consuming a Node.js/Express-driven JSON api, like /api/path/to/resource
. Perhaps you could redirect any requests with ?_escaped_fragment_
to /api/path/to/resource.html
, and use content negotiationto render an HTML template of the content, rather than return the JSON data.
假设您的 Angular 应用程序正在使用 Node.js/Express 驱动的 JSON api,例如/api/path/to/resource
. 也许您可以使用?_escaped_fragment_
to重定向任何请求/api/path/to/resource.html
,并使用内容协商来呈现内容的 HTML 模板,而不是返回 JSON 数据。
The only thing is, your Angular routes would need to match 1:1 with your REST API.
唯一的问题是,您的 Angular 路由需要与您的 REST API 1:1 匹配。
EDIT: I'm realizing that this has the potential to really muddy up your REST api and I don't recommend doing it outside of very simple use-cases where it might be a natural fit.
编辑:我意识到这有可能真正混淆你的 REST api,我不建议在非常简单的用例之外这样做,因为它可能很自然。
Instead, you can use an entirely different set of routes and controllers for your robot-friendly content. But then you're duplicating all of your AngularJS routes and controllers in Node/Express.
相反,您可以为机器人友好的内容使用一组完全不同的路由和控制器。但是随后您将在 Node/Express 中复制所有 AngularJS 路由和控制器。
I've settled on generating snapshots with a headless browser, even though I feel that's a little less-than-ideal.
我已经决定使用无头浏览器生成快照,尽管我觉得这有点不太理想。
回答by pixparker
A good practice can be found here:
一个好的做法可以在这里找到:
http://scotch.io/tutorials/javascript/angularjs-seo-with-prerender-io?_escaped_fragment_=tag
http://scotch.io/tutorials/javascript/angularjs-seo-with-prerender-io?_escaped_fragment_=tag
回答by Thor
As of now Google has changed their AJAX crawling proposal.
到目前为止,Google 已经更改了他们的 AJAX 抓取建议。
时代变了。今天,只要您不阻止 Googlebot 抓取您的 JavaScript 或 CSS 文件,我们通常就能像现代浏览器一样呈现和理解您的网页。
tl;dr: [Google] are no longer recommending the AJAX crawling proposal [Google] made back in 2009.
tl;dr:[Google] 不再推荐 [Google] 于 2009 年提出的 AJAX 抓取建议。
回答by Robert AJS
Google's Crawlable Ajax Spec, as referenced in the other answers here, is basically the answer.
此处的其他答案中提到的 Google 的可抓取 Ajax 规范基本上就是答案。
If you're interested in how other search engines and social bots deal with the same issues I wrote up the state of art here: http://blog.ajaxsnapshots.com/2013/11/googles-crawlable-ajax-specification.html
如果您对其他搜索引擎和社交机器人如何处理相同的问题感兴趣,我在这里写了最新技术:http: //blog.ajaxsnapshots.com/2013/11/googles-crawlable-ajax-specification.html
I work for a https://ajaxsnapshots.com, a company that implements the Crawlable Ajax Spec as a service - the information in that report is based on observations from our logs.
我在https://ajaxsnapshots.com工作,这是一家将 Crawlable Ajax Spec 作为服务实施的公司 - 该报告中的信息基于我们日志中的观察结果。