Ruby-on-rails 为什么 twitter 不能像 facebook 这样的网站那样通过添加服务器来扩展?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9747857/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 03:03:34  来源:igfitidea点击:

Why couldn't twitter scale by adding servers the way sites like facebook have?

ruby-on-railsrubyscalatwitter

提问by Jason

I have been looking for an explanation for why twitter had to migrate part of its middle ware from Rails to Scala. What prevented them from scaling the way facebook has, by adding servers as its user base expanded. More specifically what about the Ruby/Rails technology prevented the twitter team from taking this approach?

我一直在寻找解释为什么 twitter 必须将其部分中间件从 Rails 迁移到 Scala。是什么阻止了他们像 facebook 那样扩展,随着其用户群的扩大而添加服务器。更具体地说,Ruby/Rails 技术如何阻止 twitter 团队采用这种方法?

回答by virtualeyes

It's not that Rails doesn't scale, but rather, requests for "live" data in Ruby (or any interpreted language) do not scale, as they are comparatively far more expensive both in terms of CPU & memory utilization than their compiled language counterparts.

并不是说 Rails 不能扩展,而是说 Ruby(或任何解释性语言)中对“实时”数据的请求不能扩展,因为它们在 CPU 和内存利用率方面比编译语言对应物要昂贵得多.

Now, were Twitter a different type of service, one that had the same enormous user base, but served data that changed less frequently, Rails could be a viable option via caching; i.e. avoiding live requests to the Rails stack entirely and offloading to front end server and/or in-memory DB cache. An excellent article on this topic:

现在,如果 Twitter 是一种不同类型的服务,拥有同样庞大的用户群,但提供的数据变化不那么频繁,Rails 可以通过缓存成为一个可行的选择;即完全避免对 Rails 堆栈的实时请求并卸载到前端服务器和/或内存数据库缓存。关于这个主题的一篇优秀文章:

How Basecamp Next got to be so damn fast

Basecamp Next 怎么这么快

However, Twitter did not ditch Rails for scaling issues alone, they made the switch because Scala, as a language, provides certain built-in guarantees about the state of your application that interpreted languages cannot provide: if it compiles, time wasting bugs such as fat-fingered typos, incorrect method calls, incorrect type declarations, etc. simply cannot exist.

然而,Twitter 并没有仅仅因为扩展问题而抛弃 Rails,他们做出了转变,因为 Scala 作为一种语言,提供了解释性语言无法提供的关于应用程序状态的某些内置保证:如果它编译,就会浪费时间,例如粗指的拼写错误、不正确的方法调用、不正确的类型声明等根本不存在。

For Twitter TDD was not enough. A quote from Dijkstra in Programming in Scalaillustrates this point: "testing can only prove the presence of errors, never their absence". As their application grew, they ran into more and more hard to track down bugs. The magical mystery tour was becoming a hindrance beyond performance, so they made the switch. By all accounts an overwhelming success, Twitter is to Scala what Facebook is to PHP (although Facebook uses their own ultra fast C++ preprocessor so cheating a bit ;-))

对于 Twitter,TDD 是不够的。Dijkstra 在Programming in Scala 中的引用说明了这一点:“测试只能证明错误的存在,而不能证明它们不存在”。随着应用程序的增长,他们遇到了越来越难以追踪的错误。神奇的神秘之旅成为表演之外的障碍,所以他们做出了转换。所有人都认为这是压倒性的成功,Twitter 之于 Scala 就像 Facebook 之于 PHP(尽管 Facebook 使用他们自己的超快速 C++ 预处理器,所以有点作弊;-))

To sum up, Twitter made the switch for both performance and reliability. Of course, Rails tends to be on the innovation forefront, so the 99% non-Twitter level trafficked applications of the world can get by just fine with an interpreted language (although, I'm now solidly on the compiled language side of the fence, Scala is just too good!)

总而言之,Twitter 在性能和可靠性方面做出了转变。当然,Rails 往往走在创新的前沿,所以世界上 99% 的非 Twitter 级别的流量应用程序都可以通过解释型语言顺利过关(虽然,我现在坚定地站在围栏的编译语言一边,Scala 太好了!)

回答by Jason

No platform can infinitely scale out whilst still dealing with complex sets of data that change moment to moment. Language and infrastructure matters, but how you build your site and the data access patterns matter more.

没有任何平台可以无限扩展,同时仍要处理时刻变化的复杂数据集。语言和基础设施很重要,但您如何构建站点和数据访问模式更重要。

If you've ever played games like Transport Tycoon or Settlers where you have to transport resources around, you'll know how you need to stay on top of upgrading infrastructure as usage increases.

如果您曾经玩过《运输大亨》或《定居者》之类的游戏,您必须四处运输资源,您就会知道随着使用量的增加,您需要如何保持升级基础设施。

Scaling platforms like Facebook and Twitter is a never-ending task. You have an ever increasing number of users, and you're being pushed to add more features and functionality. It's a continual process of upgrading one bit, which causes more stress on another bit.

扩展 Facebook 和 Twitter 等平台是一项永无止境的任务。您拥有越来越多的用户,并且您被要求添加更多特性和功能。这是一个不断升级比特的过程,这会给另一个比特带来更大的压力。

Throwing servers at the problem isn't always the answer, and sometimes can cause more problems.

将服务器扔在问题上并不总是答案,有时会导致更多问题。

回答by Daniel Pittman

http://highscalability.com/scaling-twitter-making-twitter-10000-percent-fasterlinks to a set of posts about the changes, including a decent history of the steps taken over time.

http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster链接到一组关于变化的帖子,包括随着时间的推移采取的步骤的体面历史。

The short version is that Ruby and Rails didn't deliver the performance and reliability they required for the service. Given the scale, this isn't surprising; most COTS solutions are not satisfactory at the super-large end of scale.

简而言之,Ruby 和 Rails 没有提供服务所需的性能和可靠性。鉴于规模,这并不奇怪。大多数 COTS 解决方案在超大规模端都不能令人满意。

High Scalability covers a lot of questions about architecture at that top end, for other sites, so helps answer broader questions in the area too.

对于其他站点,高可扩展性涵盖了许多有关该高端架构的问题,因此也有助于回答该领域的更广泛问题。

回答by sosborn

They could have thrown more hardware at the problem, but it is a good deal more expensive then simply writing more efficient code. Like many high-level frameworks, Ruby on Rails is great at many things, but high-performance isn't one of them. Compiled languages will always be faster than interpreted languages.

他们本可以在这个问题上投入更多的硬件,但与简单地编写更高效的代码相比,它的成本要高得多。像许多高级框架一样,Ruby on Rails 在很多方面都很出色,但高性能并不是其中之一。编译型语言总是比解释型语言快。

回答by Blankman

Facebook (and Google) scale by adding more servers, but at the same time they break their application out into various services. Those services communicate via an agreed upon interface and type, and they are now free to build these services out in any technology they see fit. Just because you read that facebook uses php doesn't mean that all their backend services are being served by php (and it doesn't make sense either since in SOA you can choose any tech stack).

Facebook(和谷歌)通过添加更多服务器来扩展,但同时他们将应用程序分解为各种服务。这些服务通过商定的接口和类型进行通信,现在他们可以自由地使用他们认为合适的任何技术构建这些服务。仅仅因为你读到 facebook 使用 php 并不意味着他们所有的后端服务都由 php 提供(这也没有意义,因为在 SOA 中你可以选择任何技术堆栈)。

I think this video is the best answer to your question:

我认为这个视频是您问题的最佳答案:

"From Ruby to the JVM" https://www.youtube.com/watch?v=ohHdZXnsNi8

“从 Ruby 到 JVM” https://www.youtube.com/watch?v=ohHdZXnsNi8

回答by AndreasScheinert

I think one important bit missing here is the platform. Yes we had the compiled vs interpreted argument and a couple of others. But one very important aspect was indeed the platform. There are different Ruby VMs but none did please twitter, although they tuned it quite a bit. But scala runs on the JVM and twitter engineers hat pretty goog experience with that. Why they they didnt try/choose JRuby? Well I guess the reasons mentioned above come here into play.

我认为这里缺少的一个重要部分是平台。是的,我们有编译的 vs 解释的参数和其他几个。但一个非常重要的方面确实是平台。有不同的 Ruby VM,但没有一个让 twitter 满意,尽管他们对它进行了相当多的调整。但是 scala 在 JVM 上运行,而 twitter 工程师对此非常有经验。为什么他们没有尝试/选择 JRuby?好吧,我想上面提到的原因在这里起作用了。

回答by Daniel C. Sobral

Linear gains with parallelism (which is what multiple servers is) is exceedingly rare, and very application dependent. Yes, it exists -- that's how GPU do most of their work. If you are serving static pages, with no session state, that would also be the case.

并行性的线性增益(多服务器是什么)非常罕见,并且非常依赖于应用程序。是的,它存在——这就是 GPU 完成大部分工作的方式。如果您正在提供没有会话状态的静态页面,情况也是如此。

For the most part, however, adding servers do not increase performance linearly (ie, 10 servers are not 10 times faster than 1 server), and thatmeans that any gains you can make on a single server will have much more impact than just adding servers. It's not like Twitter doesn't have a bunch of servers, now is it?

然而,在大多数情况下,添加服务器不会线性地提高性能(即,10 台服务器不会比 1 台服务器快 10 倍),意味着您可以在单个服务器上获得的任何收益将比仅仅添加产生更大的影响服务器。Twitter 不是没有一堆服务器,现在是吗?