Javascript “单页”JS 网站和 SEO

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7549306/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 02:40:47  来源:igfitidea点击:

"Single-page" JS websites and SEO

javascriptseobackbone.jspushstatesinglepage

提问by user544941

There are a lot of cool tools for making powerful "single-page" JavaScript websites nowadays. In my opinion, this is done right by letting the server act as an API (and nothing more) and letting the client handle all of the HTML generation stuff. The problem with this "pattern" is the lack of search engine support. I can think of two solutions:

现在有很多很酷的工具可以制作强大的“单页”JavaScript 网站。在我看来,这是通过让服务器充当 API(仅此而已)并让客户端处理所有 HTML 生成内容来完成的。这种“模式”的问题在于缺乏搜索引擎支持。我可以想到两种解决方案:

  1. When the user enters the website, let the server render the page exactly as the client would upon navigation. So if I go to http://example.com/my_pathdirectly the server would render the same thing as the client would if I go to /my_paththrough pushState.
  2. Let the server provide a special website only for the search engine bots. If a normal user visits http://example.com/my_paththe server should give him a JavaScript heavy version of the website. But if the Google bot visits, the server should give it some minimal HTML with the content I want Google to index.
  1. 当用户进入网站时,让服务器完全按照客户端在导航时的方式呈现页面。因此,如果我http://example.com/my_path直接访问,服务器将呈现与客户端/my_path通过 pushState 访问相同的内容。
  2. 让服务器只为搜索引擎机器人提供一个特殊的网站。如果一个普通用户访问http://example.com/my_path服务器应该给他一个网站的 JavaScript 重度版本。但是,如果 Google bot 访问,服务器应该给它一些最小的 HTML,其中包含我希望 Google 索引的内容。

The first solution is discussed further here. I have been working on a website doing this and it's not a very nice experience. It's not DRY and in my case I had to use two different template engines for the client and the server.

此处将进一步讨论第一种解决方案。我一直在做这个的网站工作,这不是一个很好的体验。这不是 DRY,在我的情况下,我不得不为客户端和服务器使用两个不同的模板引擎。

I think I have seen the second solution for some good ol' Flash websites. I like this approach much more than the first one and with the right tool on the server it could be done quite painlessly.

我想我已经看到了一些不错的 Flash 网站的第二个解决方案。我比第一种方法更喜欢这种方法,并且在服务器上使用正确的工具可以轻松完成。

So what I'm really wondering is the following:

所以我真正想知道的是以下内容:

  • Can you think of any better solution?
  • What are the disadvantages with the second solution? If Google in some way finds out that I'm not serving the exact same content for the Google bot as a regular user, would I then be punished in the search results?
  • 你能想到更好的解决方案吗?
  • 第二种解决方案的缺点是什么?如果谷歌以某种方式发现我没有像普通用户那样为谷歌机器人提供完全相同的内容,那么我会在搜索结果中受到惩罚吗?

回答by Derick Bailey

While #2 might be "easier" for you as a developer, it only provides search engine crawling. And yes, if Google finds out your serving different content, you might be penalized (I'm not an expert on that, but I have heard of it happening).

虽然#2 对您作为开发人员来说可能“更容易”,但它仅提供搜索引擎抓取功能。是的,如果 Google 发现您提供不同的内容,您可能会受到处罚(我不是这方面的专家,但我听说过这种情况)。

Both SEO and accessibility (not just for disabled person, but accessibility via mobile devices, touch screen devices, and other non-standard computing / internet enabled platforms) both have a similar underlying philosophy: semantically rich markup that is "accessible" (i.e. can be accessed, viewed, read, processed, or otherwise used) to all these different browsers. A screen reader, a search engine crawler or a user with JavaScript enabled, should all be able to use/index/understand your site's core functionality without issue.

SEO 和可访问性(不仅适用于残疾人,还可以通过移动设备、触摸屏设备和其他非标准计算/互联网平台实现的可访问性)都具有相似的基本理念:语义丰富的标记是“可访问的”(即可以被访问、查看、阅读、处理或以其他方式使用)到所有这些不同的浏览器。屏幕阅读器、搜索引擎爬虫或启用了 JavaScript 的用户都应该能够毫无问题地使用/索引/理解您网站的核心功能。

pushStatedoes not add to this burden, in my experience. It only brings what used to be an afterthought and "if we have time" to the forefront of web development.

pushState根据我的经验,不会增加这种负担。它只会将过去的事后想法和“如果我们有时间”带到 Web 开发的最前沿。

What your describe in option #1 is usually the best way to go - but, like other accessibility and SEO issues, doing this with pushStatein a JavaScript-heavy app requires up-front planning or it will become a significant burden. It should be baked in to the page and application architecture from the start - retrofitting is painful and will cause more duplication than is necessary.

您在选项 #1 中所描述的通常是最好的方法 - 但是,与其他可访问性和 SEO 问题一样,pushState在 JavaScript 密集型应用程序中执行此操作需要预先计划,否则将成为重大负担。它应该从一开始就融入到页面和应用程序架构中——改造是痛苦的,并且会导致不必要的重复。

I've been working with pushStateand SEO recently for a couple of different application, and I found what I think is a good approach. It basically follows your item #1, but accounts for not duplicating html / templates.

我最近一直在pushState为几个不同的应用程序使用SEO,我发现我认为这是一个很好的方法。它基本上遵循您的第 1 项,但不会重复 html/模板。

Most of the info can be found in these two blog posts:

大多数信息都可以在这两篇博文中找到:

http://lostechies.com/derickbailey/2011/09/06/test-driving-backbone-views-with-jquery-templates-the-jasmine-gem-and-jasmine-jquery/

http://lostechies.com/derickbailey/2011/09/06/test-driving-backbone-views-with-jquery-templates-the-jasmine-gem-and-jasmine-jquery/

and

http://lostechies.com/derickbailey/2011/06/22/rendering-a-rails-partial-as-a-jquery-template/

http://lostechies.com/derickbailey/2011/06/22/rendering-a-rails-partial-as-a-jquery-template/

The gist of it is that I use ERB or HAML templates (running Ruby on Rails, Sinatra, etc) for my server side render and to create the client side templates that Backbone can use, as well as for my Jasmine JavaScript specs. This cuts out the duplication of markup between the server side and the client side.

其要点是我使用 ERB 或 HAML 模板(运行 Ruby on Rails、Sinatra 等)进行服务器端渲染,并创建 Backbone 可以使用的客户端模板,以及我的 Jasmine JavaScript 规范。这消除了服务器端和客户端之间的标记重复。

From there, you need to take a few additional steps to have your JavaScript work with the HTML that is rendered by the server - true progressive enhancement; taking the semantic markup that got delivered and enhancing it with JavaScript.

从那里,您需要采取一些额外的步骤,让您的 JavaScript 与服务器呈现的 HTML 一起工作——真正的渐进式增强;采用已交付的语义标记并使用 JavaScript 对其进行增强。

For example, i'm building an image gallery application with pushState. If you request /images/1from the server, it will render the entire image gallery on the server and send all of the HTML, CSS and JavaScript down to your browser. If you have JavaScript disabled, it will work perfectly fine. Every action you take will request a different URL from the server and the server will render all of the markup for your browser. If you have JavaScript enabled, though, the JavaScript will pick up the already rendered HTML along with a few variables generated by the server and take over from there.

例如,我正在使用pushState. 如果您/images/1从服务器请求,它将在服务器上呈现整个图片库,并将所有 HTML、CSS 和 JavaScript 发送到您的浏览器。如果您禁用了 JavaScript,它将完全正常工作。您采取的每个操作都会从服务器请求不同的 URL,服务器将为您的浏览器呈现所有标记。但是,如果您启用了 JavaScript,JavaScript 将选择已经呈现的 HTML 以及服务器生成的一些变量,并从那里接管。

Here's an example:

下面是一个例子:

<form id="foo">
  Name: <input id="name"><button id="say">Say My Name!</button>
</form>

After the server renders this, the JavaScript would pick it up (using a Backbone.js view in this example)

在服务器呈现这个之后,JavaScript 会选择它(在这个例子中使用 Backbone.js 视图)

FooView = Backbone.View.extend({
  events: {
    "change #name": "setName",
    "click #say": "sayName"
  },

  setName: function(e){
    var name = $(e.currentTarget).val();
    this.model.set({name: name});
  },

  sayName: function(e){
    e.preventDefault();
    var name = this.model.get("name");
    alert("Hello " + name);
  },

  render: function(){
    // do some rendering here, for when this is just running JavaScript
  }
});

$(function(){
  var model = new MyModel();
  var view = new FooView({
    model: model,
    el: $("#foo")
  });
});

This is a very simple example, but I think it gets the point across.

这是一个非常简单的例子,但我认为它可以说明问题。

When I instante the view after the page loads, I'm providing the existing content of the form that was rendered by the server, to the view instance as the elfor the view. I am notcalling render or having the view generate an elfor me, when the first view is loaded. I have a render method available for after the view is up and running and the page is all JavaScript. This lets me re-render the view later if I need to.

当我在页面加载后实例化视图时,我将服务器呈现的表单的现有内容提供给视图实例作为el视图的 。在加载第一个视图时,我没有调用渲染或让视图el为我生成一个。在视图启动并运行并且页面都是 JavaScript 之后,我有一个可用的渲染方法。如果需要,这可以让我稍后重新渲染视图。

Clicking the "Say My Name" button with JavaScript enabled will cause an alert box. Without JavaScript, it would post back to the server and the server could render the name to an html element somewhere.

在启用 JavaScript 的情况下单击“说出我的名字”按钮将导致一个警告框。如果没有 JavaScript,它会回发到服务器,服务器可以将名称渲染到某个地方的 html 元素。

Edit

编辑

Consider a more complex example, where you have a list that needs to be attached (from the comments below this)

考虑一个更复杂的例子,你有一个需要附加的列表(来自下面的评论)

Say you have a list of users in a <ul>tag. This list was rendered by the server when the browser made a request, and the result looks something like:

假设您有一个<ul>标签中的用户列表。这个列表是在浏览器发出请求时由服务器呈现的,结果如下所示:

<ul id="user-list">
  <li data-id="1">Bob
  <li data-id="2">Mary
  <li data-id="3">Frank
  <li data-id="4">Jane
</ul>

Now you need to loop through this list and attach a Backbone view and model to each of the <li>items. With the use of the data-idattribute, you can find the model that each tag comes from easily. You'll then need a collection view and item view that is smart enough to attach itself to this html.

现在您需要遍历此列表并将 Backbone 视图和模型附加到每个<li>项目。通过使用该data-id属性,您可以轻松找到每个标签来自的模型。然后,您将需要一个足够智能的集合视图和项目视图以将其自身附加到此 html。

UserListView = Backbone.View.extend({
  attach: function(){
    this.el = $("#user-list");
    this.$("li").each(function(index){
      var userEl = $(this);
      var id = userEl.attr("data-id");
      var user = this.collection.get(id);
      new UserView({
        model: user,
        el: userEl
      });
    });
  }
});

UserView = Backbone.View.extend({
  initialize: function(){
    this.model.bind("change:name", this.updateName, this);
  },

  updateName: function(model, val){
    this.el.text(val);
  }
});

var userData = {...};
var userList = new UserCollection(userData);
var userListView = new UserListView({collection: userList});
userListView.attach();

In this example, the UserListViewwill loop through all of the <li>tags and attach a view object with the correct model for each one. it sets up an event handler for the model's name change event and updates the displayed text of the element when a change occurs.

在这个例子中,UserListView将循环遍历所有<li>标签并为每个标签附加一个具有正确模型的视图对象。它为模型的名称更改事件设置一个事件处理程序,并在发生更改时更新元素的显示文本。



This kind of process, to take the html that the server rendered and have my JavaScript take over and run it, is a great way to get things rolling for SEO, Accessibility, and pushStatesupport.

这种获取服务器呈现的 html 并让我的 JavaScript 接管并运行它的过程,是让 SEO、可访问性和pushState支持工作顺利进行的好方法。

Hope that helps.

希望有帮助。

回答by Ariel

I think you need this: http://code.google.com/web/ajaxcrawling/

我认为你需要这个:http: //code.google.com/web/ajaxcrawling/

You can also install a special backend that "renders" your page by running javascript on the server, and then serves that to google.

您还可以安装一个特殊的后端,通过在服务器上运行 javascript 来“呈现”您的页面,然后将其提供给 google。

Combine both things and you have a solution without programming things twice. (As long as your app is fully controllable via anchor fragments.)

结合这两件事,你就有了一个解决方案,而无需对事物进行两次编程。(只要您的应用程序可以通过锚点片段完全控制。)

回答by Leonidaz

So, it seem that the main concern is being DRY

所以,似乎主要关注的是干燥

  • If you're using pushState have your server send the same exact code for all urls (that don't contain a file extension to serve images, etc.) "/mydir/myfile", "/myotherdir/myotherfile" or root "/" -- all requests receive the same exact code. You need to have some kind url rewrite engine. You can also serve a tiny bit of html and the rest can come from your CDN (using require.js to manage dependencies -- see https://stackoverflow.com/a/13813102/1595913).
  • (test the link's validity by converting the link to your url scheme and testing against existence of content by querying a static or a dynamic source. if it's not valid send a 404 response.)
  • When the request is not from a google bot, you just process normally.
  • If the request is from a google bot, you use phantom.js -- headless webkit browser ("A headless browser is simply a full-featured web browser with no visual interface.") to render html and javascript on the server and send the google bot the resulting html. As the bot parses the html it can hit your other "pushState" links /somepage on the server <a href="/someotherpage">mylink</a>, the server rewrites url to your application file, loads it in phantom.js and the resulting html is sent to the bot, and so on...
  • For your html I'm assuming you're using normal links with some kind of hiHymaning (e.g. using with backbone.js https://stackoverflow.com/a/9331734/1595913)
  • To avoid confusion with any links separate your api code that serves json into a separate subdomain, e.g. api.mysite.com
  • To improve performance you can pre-process your site pages for search engines ahead of time during off hours by creating static versions of the pages using the same mechanism with phantom.js and consequently serve the static pages to google bots. Preprocessing can be done with some simple app that can parse <a>tags. In this case handling 404 is easier since you can simply check for the existence of the static file with a name that contains url path.
  • If you use #! hash bang syntax for your site links a similar scenario applies, except that the rewrite url server engine would look out for _escaped_fragment_ in the url and would format the url to your url scheme.
  • There are a couple of integrations of node.js with phantom.js on github and you can use node.js as the web server to produce html output.
  • 如果您使用 pushState,让您的服务器为所有 url 发送完全相同的代码(不包含用于提供图像的文件扩展名等)“/mydir/myfile”、“/myotherdir/myotherfile”或根“/ " -- 所有请求都收到相同的确切代码。你需要有某种 url 重写引擎。您还可以提供一点点 html,其余的可以来自您的 CDN(使用 require.js 来管理依赖项——请参阅https://stackoverflow.com/a/13813102/1595913)。
  • (通过将链接转换为您的 url 方案并通过查询静态或动态源来测试内容是否存在来测试链接的有效性。如果它无效,则发送 404 响应。)
  • 当请求不是来自谷歌机器人时,你只是正常处理。
  • 如果请求来自 google bot,则使用 phantom.js——无头 webkit 浏览器(“无头浏览器只是一个没有可视界面的全功能网络浏览器。”)在服务器上呈现 html 和 javascript 并发送google bot 生成的 html。当机器人解析 html 时,它可以访问服务器上的其他“pushState”链接 /somepage,服务器<a href="/someotherpage">mylink</a>将 url 重写为您的应用程序文件,将其加载到 phantom.js 并将生成的 html 发送到机器人,依此类推。 ..
  • 对于您的 html,我假设您使用的是带有某种劫持的普通链接(例如,与backbone.js 一起使用https://stackoverflow.com/a/9331734/1595913
  • 为避免与任何链接混淆,将提供 json 的 api 代码分离到一个单独的子域中,例如 api.mysite.com
  • 为了提高性能,您可以在下班时间通过使用与 phantom.js 相同的机制创建页面的静态版本,为搜索引擎提前预处理您的网站页面,然后将静态页面提供给谷歌机器人。可以使用一些可以解析<a>标签的简单应用程序来完成预处理。在这种情况下,处理 404 更容易,因为您可以简单地检查名称包含 url 路径的静态文件是否存在。
  • 如果您使用 #! 您的站点链接的 hash bang 语法适用于类似的情况,不同之处在于重写 url 服务器引擎会在 url 中查找 _escaped_fragment_ 并将 url 格式化为您的 url 方案。
  • github 上有几个 node.js 与 phantom.js 的集成,您可以使用 node.js 作为 Web 服务器来生成 html 输出。

Here are a couple of examples using phantom.js for seo:

以下是使用 phantom.js 进行 seo 的几个示例:

http://backbonetutorials.com/seo-for-single-page-apps/

http://backbonetutorials.com/seo-for-single-page-apps/

http://thedigitalself.com/blog/seo-and-javascript-with-phantomjs-server-side-rendering

http://thedigitalself.com/blog/seo-and-javascript-with-phantomjs-server-side-rendering

回答by Tim Scott

If you're using Rails, try ptheitroadot. It's a gem that makes it dead simple to reuse mustacheor handlebarstemplates client and server side.

如果您使用 Rails,请尝试ptheitroadot。这是一个宝石,它使重用mustache把手模板客户端和服务器端变得非常简单。

Create a file in your views like _some_thingy.html.mustache.

在您的视图中创建一个文件,例如_some_thingy.html.mustache.

Render server side:

渲染服务器端:

<%= render :partial => 'some_thingy', object: my_model %>

Put the template your head for client side use:

把模板放在你的头上供客户端使用:

<%= template_include_tag 'some_thingy' %>

Rendre client side:

渲染客户端:

html = ptheitroadot.someThingy(my_model)

回答by Clive

To take a slightly different angle, your second solution would be the correct one in terms of accessibility...you would be providing alternative content to users who cannot use javascript (those with screen readers, etc.).

从稍微不同的角度来看,您的第二个解决方案在可访问性方面是正确的……您将为无法使用 javascript 的用户(具有屏幕阅读器等的用户)提供替代内容。

This would automatically add the benefits of SEO and, in my opinion, would not be seen as a 'naughty' technique by Google.

这会自动增加 SEO 的好处,在我看来,不会被谷歌视为“顽皮”的技术。

回答by Michael van Rooijen

Interesting. I have been searching around for viable solutions but it seems to be quite problematic.

有趣的。我一直在寻找可行的解决方案,但似乎很成问题。

I was actually leaning more towards your 2nd approach:

我实际上更倾向于你的第二种方法:

Let the server provide a special website only for the search engine bots. If a normal user visits http://example.com/my_paththe server should give him a JavaScript heavy version of the website. But if the Google bot visits, the server should give it some minimal HTML with the content I want Google to index.

让服务器只为搜索引擎机器人提供一个特殊的网站。如果普通用户访问http://example.com/my_path,服务器应该为他提供该网站的 JavaScript 重度版本。但是,如果 Google bot 访问,服务器应该给它一些最小的 HTML,其中包含我希望 Google 索引的内容。

Here's my take on solving the problem. Although it is not confirmed to work, it might provide some insight or idea's for other developers.

这是我对解决问题的看法。虽然它没有被证实有效,但它可能为其他开发人员提供一些见解或想法。

Assume you're using a JS framework that supports "push state" functionality, and your backend framework is Ruby on Rails. You have a simple blog site and you would like search engines to index all your article indexand showpages.

假设您正在使用支持“推送状态”功能的 JS 框架,并且您的后端框架是 Ruby on Rails。您有一个简单的博客站点,您希望搜索引擎为您的所有文章indexshow页面编制索引。

Let's say you have your routes set up like this:

假设您的路线设置如下:

resources :articles
match "*path", "main#index"

Ensure that every server-side controller renders the same template that your client-side framework requires to run (html/css/javascript/etc). If none of the controllers are matched in the request (in this example we only have a RESTful set of actions for the ArticlesController), then just match anything else and just render the template and let the client-side framework handle the routing. The only difference between hitting a controller and hitting the wildcard matcher would be the ability to render content based on the URL that was requested to JavaScript-disabled devices.

确保每个服务器端控制器呈现客户端框架运行所需的相同模板(html/css/javascript/etc)。如果请求中没有任何控制器匹配(在此示例中,我们只有一组 RESTful 操作ArticlesController),则匹配其他任何内容并渲染模板并让客户端框架处理路由。点击控制器和点击通配符匹配器之间的唯一区别是能够根据向禁用 JavaScript 的设备请求的 URL 呈现内容。

From what I understand it is a bad idea to render content that isn't visible to browsers. So when Google indexes it, people go through Google to visit a given page and there isn't any content, then you're probably going to be penalised. What comes to mind is that you render content in a divnode that you display: nonein CSS.

据我所知,呈现浏览器不可见的内容是一个坏主意。所以当谷歌索引它时,人们通过谷歌访问给定的页面并且没有任何内容,那么你可能会受到惩罚。想到的是div,您display: none在 CSS中的节点中呈现内容。

However, I'm pretty sure it doesn't matter if you simply do this:

但是,我很确定如果您只是这样做并不重要:

<div id="no-js">
  <h1><%= @article.title %></h1>
  <p><%= @article.description %></p>
  <p><%= @article.content %></p>
</div>

And then using JavaScript, which doesn't get run when a JavaScript-disabled device opens the page:

然后使用 JavaScript,当禁用 JavaScript 的设备打开页面时,它不会运行:

$("#no-js").remove() # jQuery

This way, for Google, and for anyone with JavaScript-disabled devices, they would see the raw/static content. So the content isphysically there and is visible to anyone with JavaScript-disabled devices.

这样,对于谷歌和任何使用 JavaScript 禁用设备的人,他们将看到原始/静态内容。所以内容在那里,任何使用 JavaScript 禁用设备的人都可以看到。

But, when a user visits the same page and actually hasJavaScript enabled, the #no-jsnode will be removed so it doesn't clutter up your application. Then your client-side framework will handle the request through it's router and display what a user should see when JavaScript is enabled.

但是,当用户访问同一页面,实际上已经支持JavaScript,该#no-js节点将被删除,因此不会弄乱你的应用程序。然后您的客户端框架将通过它的路由器处理请求并显示启用 JavaScript 时用户应该看到的内容。

I think this might be a valid and fairly easy technique to use. Although that might depend on the complexity of your website/application.

我认为这可能是一种有效且相当容易使用的技术。虽然这可能取决于您的网站/应用程序的复杂性。

Though, please correct me if it isn't. Just thought I'd share my thoughts.

不过,如果不是,请纠正我。只是想我会分享我的想法。

回答by Phrearch

Use NodeJS on the serverside, browserify your clientside code and route each http-request's(except for static http resources) uri through a serverside client to provide the first 'bootsnap'(a snapshot of the page it's state). Use something like jsdom to handle jquery dom-ops on the server. After the bootsnap returned, setup the websocket connection. Probably best to differentiate between a websocket client and a serverside client by making some kind of a wrapper connection on the clientside(serverside client can directly communicate with the server). I've been working on something like this: https://github.com/jvanveen/rnet/

在服务器端使用 NodeJS,浏览您的客户端代码并通过服务器端客户端路由每个 http 请求(静态 http 资源除外)uri 以提供第一个“引导快照”(页面状态的快照)。使用类似 jsdom 的东西来处理服务器上的 jquery dom-ops。bootsnap 返回后,设置 websocket 连接。通过在客户端进行某种包装连接(服务器端客户端可以直接与服务器通信),可能最好区分 websocket 客户端和服务器端客户端。我一直在做这样的事情:https: //github.com/jvanveen/rnet/

回答by Ale? Kotnik

Use Google Closure Templateto render pages. It compiles to javascript or java, so it is easy to render the page either on the client or server side. On the first encounter with every client, render the html and add javascript as link in header. Crawler will read the html only but the browser will execute your script. All subsequent requests from the browser could be done in against the api to minimize the traffic.

使用Google Closure Template呈现页面。它编译为 javascript 或 java,因此很容易在客户端或服务器端呈现页面。在第一次遇到每个客户端时,呈现 html 并在标题中添加 javascript 作为链接。爬虫只会读取 html,但浏览器会执行您的脚本。来自浏览器的所有后续请求都可以针对 api 完成,以最大限度地减少流量。