Ruby-on-rails 实施社交活动流的最佳方式是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/202198/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What's the best manner of implementing a social activity stream?
提问by
I'm interested in hearing your opinions in which is the best way of implementing a social activity stream (Facebook is the most famous example). Problems/challenges involved are:
我很想听听您的意见,哪些是实施社交活动流的最佳方式(Facebook 是最著名的例子)。涉及的问题/挑战是:
- Different types of activities (posting, commenting ..)
- Different types of objects (post, comment, photo ..)
- 1-n users involved in different roles ("User x replied to User y's comment on User's Z post")
- Different views of the same activity item ("you commented .." vs. "your friend x commented" vs. "user x commented .." => 3 representations of a "comment" activity)
- 不同类型的活动(发帖、评论......)
- 不同类型的对象(帖子、评论、照片......)
- 1-n 个用户参与不同的角色(“用户 x 回复了用户 y 对用户 Z 帖子的评论”)
- 同一活动项目的不同视图(“您评论了..”与“您的朋友 x 评论了”与“用户 x 评论了..”=>“评论”活动的 3 种表示)
.. and some more, especially if you take it to a high level of sophistication, as Facebook does, for example, combining several activity items into one ("users x, y and z commented on that photo"
.. 以及更多,特别是如果你把它带到一个高水平的复杂程度,例如 Facebook 所做的,将几个活动项目组合成一个(“用户 x、y 和 z 对该照片发表了评论”
Any thoughts or pointers on patterns, papers, etc on the most flexible, efficient and powerful approaches to implementing such a system, data model, etc. would be appreciated.
对于实现此类系统、数据模型等的最灵活、最有效和最强大的方法的模式、论文等的任何想法或指示,将不胜感激。
Although most of the issues are platform-agnostic, chances are I end up implementing such a system on Ruby on Rails
尽管大多数问题与平台无关,但我很有可能最终在 Ruby on Rails 上实现了这样的系统
回答by heyman
I have created such system and I took this approach:
我创建了这样的系统,并采用了这种方法:
Database table with the following columns: id, userId, type, data, time.
包含以下列的数据库表:id、userId、type、data、time。
- userIdis the user who generated the activity
- typeis the type of the activity (i.e. Wrote blog post, added photo, commented on user's photo)
- datais a serialized object with meta-data for the activity where you can put in whatever you want
- userId是生成活动的用户
- type是活动的类型(即写博客文章、添加照片、评论用户的照片)
- data是一个序列化的对象,带有活动的元数据,您可以在其中放入任何您想要的内容
This limits the searches/lookups, you can do in the feeds, to users, time and activity types, but in a facebook-type activity feed, this isn't really limiting. And with correct indices on the table the lookups are fast.
这限制了搜索/查找,您可以在提要中针对用户、时间和活动类型进行搜索/查找,但在 facebook 类型的活动提要中,这并不是真正的限制。并且表上的索引正确,查找速度很快。
With this design you would have to decide what metadata each type of event should require. For example a feed activity for a new photo could look something like this:
通过这种设计,您必须决定每种类型的事件应该需要哪些元数据。例如,一张新照片的提要活动可能如下所示:
{id:1, userId:1, type:PHOTO, time:2008-10-15 12:00:00, data:{photoId:2089, photoName:A trip to the beach}}
You can see that, although the name of the photo most certainly is stored in some other table containing the photos, and I could retrieve the name from there, I will duplicate the name in the metadata field, because you don't want to do any joins on other database tables if you want speed. And in order to display, say 200, different events from 50 different users, you need speed.
您可以看到,虽然照片的名称肯定存储在其他包含照片的表中,并且我可以从那里检索名称,但我将在元数据字段中复制名称,因为您不想这样做如果您想要速度,可以在其他数据库表上进行任何连接。为了显示来自 50 个不同用户的 200 个不同事件,您需要速度。
Then I have classes that extends a basic FeedActivity class for rendering the different types of activity entries. Grouping of events would be built in the rendering code as well, to keep away complexity from the database.
然后我有扩展基本 FeedActivity 类的类,用于呈现不同类型的活动条目。事件分组也将在渲染代码中构建,以防止数据库的复杂性。
回答by Mark Kennedy
This is a very good presentation outlining how Etsy.com architected their activity streams. It's the best example I've found on the topic, though it's not rails specific.
这是一个很好的演示文稿,概述了 Etsy.com 如何构建他们的活动流。这是我在该主题上找到的最好的例子,尽管它不是特定于 Rails 的。
http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture
http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture
回答by Thierry
We've open sourced our approach: https://github.com/tschellenbach/Stream-FrameworkIt's currently the largest open source library aimed at solving this problem.
我们已经开源了我们的方法:https: //github.com/tschellenbach/Stream-Framework它是目前最大的旨在解决这个问题的开源库。
The same team which built Stream Framework also offers a hosted API, which handles the complexity for you. Have a look at getstream.ioThere are clients available for Node, Python, Rails and PHP.
构建 Stream Framework 的同一团队还提供了一个托管 API,可以为您处理复杂性。看看getstream.io有适用于 Node、Python、Rails 和 PHP 的客户端。
In addition have a look at this high scalability post were we explain some of the design decisions involved: http://highscalability.com/blog/2013/10/28/design-decisions-for-scaling-your-high-traffic-feeds.html
另外看看这篇高可扩展性的帖子,我们解释了一些涉及的设计决策:http: //highscalability.com/blog/2013/10/28/design-decisions-for-scaling-your-high-traffic- feeds.html
This tutorialwill help you setup a system like Pinterest's feed using Redis. It's quite easy to get started with.
本教程将帮助您使用 Redis 设置像 Pinterest 的提要这样的系统。这很容易上手。
To learn more about feed design I highly recommend reading some of the articles which we based Feedly on:
要了解有关 Feed 设计的更多信息,我强烈建议您阅读我们基于 Feedly 的一些文章:
- Yahoo Research Paper
- Twitter 2013 Redis based, with fallback
- Cassandra at Instagram
- Etsy feed scaling
- Facebook history
- Django project, with good naming conventions. (But database only)
- http://activitystrea.ms/specs/atom/1.0/(actor, verb, object, target)
- Quora post on best practises
- Quora scaling a social network feed
- Redis ruby example
- FriendFeed approach
- Thoonk setup
- Twitter's Approach
- 雅虎研究论文
- 基于 Twitter 2013 Redis 的,具有后备功能
- 卡桑德拉在 Instagram
- Etsy 提要缩放
- 脸书历史
- Django 项目,具有良好的命名约定。(但仅限数据库)
- http://activitystrea.ms/specs/atom/1.0/(演员、动词、宾语、目标)
- Quora 关于最佳实践的帖子
- Quora 扩展社交网络提要
- Redis 红宝石示例
- FriendFeed 方法
- Thoonk 设置
- 推特的方法
Though Stream Framework is Python based it wouldn't be too hard to use from a Ruby app. You could simply run it as a service and stick a small http API in front of it. We are considering adding an API to access Feedly from other languages. At the moment you'll have to role your own though.
尽管 Stream Framework 是基于 Python 的,但从 Ruby 应用程序中使用它并不难。您可以简单地将它作为服务运行,并在它前面粘贴一个小的 http API。我们正在考虑添加一个 API 以从其他语言访问 Feedly。目前,您必须扮演自己的角色。
回答by Tim Howland
The biggest issues with event streams are visibility and performance; you need to restrict the events displayed to be only the interesting ones for that particular user, and you need to keep the amount of time it takes to sort through and identify those events manageable. I've built a smallish social network; I found that at small scales, keeping an "events" table in a database works, but that it gets to be a performance problem under moderate load.
事件流的最大问题是可见性和性能;您需要限制显示的事件仅为该特定用户感兴趣的事件,并且您需要将整理和识别这些事件所需的时间保持在可管理的范围内。我建立了一个小型社交网络;我发现在小规模下,在数据库中保留一个“事件”表是可行的,但在中等负载下它会成为一个性能问题。
With a larger stream of messages and users, it's probably best to go with a messaging system, where events are sent as messages to individual profiles. This means that you can't easily subscribe to people's event streams and see previous events very easily, but you are simply rendering a small group of messages when you need to render the stream for a particular user.
对于更大的消息流和用户,最好使用消息传递系统,在该系统中,事件作为消息发送到个人配置文件。这意味着您无法轻松订阅人们的事件流并非常轻松地查看以前的事件,但是当您需要为特定用户呈现流时,您只是呈现一小组消息。
I believe this was Twitter's original design flaw- I remember reading that they were hitting the database to pull in and filter their events. This had everything to do with architecture and nothing to do with Rails, which (unfortunately) gave birth to the "ruby doesn't scale" meme. I recently saw a presentation where the developer used Amazon's Simple Queue Serviceas their messaging backend for a twitter-like application that would have far higher scaling capabilities- it may be worth looking into SQS as part of your system, if your loads are high enough.
我相信这是 Twitter 的原始设计缺陷 - 我记得读到他们正在访问数据库以拉入和过滤他们的事件。这完全与架构有关,而与 Rails 无关,Rails(不幸的是)催生了“ruby 无法扩展”的模因。我最近看到一个演示,其中开发人员使用 Amazon 的Simple Queue Service作为他们的消息后端,用于类似 twitter 的应用程序,该应用程序将具有更高的扩展能力 - 如果您的负载足够高,可能值得将 SQS 作为系统的一部分进行研究.
回答by Rene Pickhardt
If you are willing to use a separate software I suggest the Graphity server which exactly solves the problem for activity streams (building on top of neo4j graph data base).
如果您愿意使用单独的软件,我建议使用 Graphity 服务器,它可以完全解决活动流的问题(建立在 neo4j 图形数据库之上)。
The algorithms have been implemented as a standalone REST server so that you can host your own server to deliver activity streams: http://www.rene-pickhardt.de/graphity-server-for-social-activity-streams-released-gplv3/
这些算法已作为独立的 REST 服务器实现,因此您可以托管自己的服务器来提供活动流:http: //www.rene-pickhardt.de/graphity-server-for-social-activity-streams-released-gplv3 /
In the paper and benchmark I showed that retrieving news streams depends only linear on the amount of items you want to retrieve without any redundancy you would get from denormalizing the data:
在论文和基准测试中,我展示了检索新闻流仅取决于您想要检索的项目数量,而没有您从非规范化数据中获得的任何冗余:
On the above link you find screencasts and a benchmark of this approach (showing that graphity is able to retrieve more than 10k streams per second).
在上面的链接中,您可以找到截屏视频和这种方法的基准(表明图形能够每秒检索超过 10k 流)。
回答by jammus
I started to implement a system like this yesterday, here's where I've got to...
我昨天开始实施这样的系统,这是我必须...
I created a StreamEventclass with the properties Id, ActorId, TypeId, Date, ObjectIdand a hashtable of additional Detailskey/value pairs. This is represented in the database by a StreamEventtable (Id, ActorId, TypeId, Date, ObjectId) and a StreamEventDetailstable (StreamEventId, DetailKey, DetailValue).
我创建了一个StreamEvent类,其属性为Id、ActorId、TypeId、Date、ObjectId和一个附加Details键/值对的哈希表。这在数据库中由StreamEvent表(Id、ActorId、TypeId、Date、ObjectId)和StreamEventDetails表(StreamEventId、DetailKey、DetailValue)表示。
The ActorId, TypeIdand ObjectIdallow for a Subject-Verb-Object event to be captured (and later queried). Each action may result in several StreamEvent instances being created.
ActorId、TypeId和ObjectId允许捕获(并稍后查询)主谓宾事件。每个操作可能会导致创建多个 StreamEvent 实例。
I've then created a sub-class for of StreamEvent each type of event, e.g. LoginEvent, PictureCommentEvent. Each of these subclasses has more context specific properties such as PictureId, ThumbNail, CommenText, etc (whatever is required for the event) which are actually stored as key/value pairs in the hashtable/StreamEventDetail table.
然后,我为 StreamEvent 的每种类型的事件创建了一个子类,例如LoginEvent、PictureCommentEvent。这些子类中的每一个都有更多上下文特定的属性,例如PictureId、ThumbNail、CommenText等(事件所需的任何内容),它们实际上作为键/值对存储在 hashtable/StreamEventDetail 表中。
When pulling these events back from the database I use a factory method (based on the TypeId) to create the correct StreamEvent class.
从数据库中提取这些事件时,我使用工厂方法(基于TypeId)来创建正确的 StreamEvent 类。
Each subclass of StreamEvent has a Render(contextAs StreamContext) method which outputs the event to screen based on the passed StreamContextclass. The StreamContext class allows options to be set based on the context of the view. If you look at Facebook for example your news feed on the homepage lists the fullnames (and links to their profile) of everyone involved in each action, whereas looking a friend's feed you only see their first name (but the full names of other actors).
StreamEvent 的每个子类都有一个 Render( contextAs StreamContext) 方法,该方法根据传递的StreamContext类将事件输出到屏幕。StreamContext 类允许根据视图的上下文设置选项。例如,如果您查看 Facebook,主页上的新闻提要列出了参与每个操作的每个人的全名(以及指向他们个人资料的链接),而查看朋友的提要,您只会看到他们的名字(但其他演员的全名) .
I haven't implemented a aggregate feed (Facebook home) yet but I imagine I'll create a AggregateFeedtable which has the fields UserId, StreamEventIdwhich is populated based on some kind of 'Hmmm, you might find this interesting' algorithm.
我还没有实现聚合提要(Facebook 主页),但我想我会创建一个AggregateFeed表,其中包含字段UserId和StreamEventId,该表基于某种“嗯,你可能会发现这个有趣的”算法填充。
Any comments would be massively appreciated.
任何评论将不胜感激。
回答by jedediah
// one entry per actual event
events {
id, timestamp, type, data
}
// one entry per event, per feed containing that event
events_feeds {
event_id, feed_id
}
When the event is created, decide which feeds it appears in and add those to events_feeds. To get a feed, select from events_feeds, join in events, order by timestamp. Filtering and aggregation can then be done on the results of that query. With this model, you can change the event properties after creation with no extra work.
创建事件后,决定它出现在哪些提要中,并将这些提要添加到 events_feeds。要获取提要,请从 events_feeds 中选择,加入事件,按时间戳排序。然后可以对该查询的结果进行过滤和聚合。使用此模型,您可以在创建后更改事件属性而无需额外工作。
回答by Alderete
If you do decide that you're going to implement in Rails, perhaps you will find the following plugin useful:
如果您决定要在 Rails 中实现,也许您会发现以下插件很有用:
ActivityStreams: http://github.com/face/activity_streams/tree/master
活动流:http: //github.com/face/activity_streams/tree/master
If nothing else, you'll get to look at an implementation, both in terms of the data model, as well as the API provided for pushing and pulling activities.
如果不出意外,您将在数据模型以及为推送和拉取活动提供的 API 方面查看实现。
回答by Alderete
I had a similar approach to that of heyman - a denormalized table containing all of the data that would be displayed in a given activity stream. It works fine for a small site with limited activity.
我有一个与 heyman 类似的方法 - 一个非规范化的表,包含将在给定活动流中显示的所有数据。它适用于活动有限的小型站点。
As mentioned above, it is likely to face scalability issues as the site grows. Personally, I am not worried about the scaling issues right now. I'll worry about that at a later time.
如上所述,随着站点的增长,它可能会面临可扩展性问题。就我个人而言,我现在并不担心缩放问题。以后我会担心的。
Facebook has obviously done a great job of scaling so I would recommend that you read their engineering blog, as it has a ton of great content -> http://www.facebook.com/notes.php?id=9445547199
Facebook 显然在扩展方面做得很好,所以我建议你阅读他们的工程博客,因为它有很多很棒的内容 -> http://www.facebook.com/notes.php?id=9445547199
I have been looking into better solutions than the denormalized table I mentioned above. Another way I have found of accomplishing this is to condense all the content that would be in a given activity stream into a single row. It could be stored in XML, JSON, or some serialized format that could be read by your application. The update process would be simple too. Upon activity, place the new activity into a queue (perhaps using Amazon SQS or something else) and then continually poll the queue for the next item. Grab that item, parse it, and place its contents in the appropriate feed object stored in the database.
我一直在寻找比我上面提到的非规范化表更好的解决方案。我发现的另一种实现此目的的方法是将给定活动流中的所有内容压缩到一行中。它可以存储在 XML、JSON 或某些可以由您的应用程序读取的序列化格式中。更新过程也很简单。在活动时,将新活动放入队列(可能使用 Amazon SQS 或其他东西),然后不断轮询队列以获取下一项。获取该项目,对其进行解析,并将其内容放入存储在数据库中的相应提要对象中。
The good thing about this method is that you only need to read a single database table whenever that particular feed is requested, rather than grabbing a series of tables. Also, it allows you to maintain a finite list of activities as you may pop off the oldest activity item whenever you update the list.
这种方法的好处是,每当请求特定提要时,您只需读取单个数据库表,而不是获取一系列表。此外,它允许您维护一个有限的活动列表,因为您可以在更新列表时弹出最旧的活动项目。
Hope this helps! :)
希望这可以帮助!:)
回答by Benjamin Crouzier
There are two railscasts about such an activity stream:
关于这样的活动流有两个 railscasts:
- http://railscasts.com/episodes/406-public-activity(An activity feed with the gem public_activity)
- http://railscasts.com/episodes/407-activity-feed-from-scratch(Same thing from scratch)
- http://railscasts.com/episodes/406-public-activity(带有 gem public_activity的活动提要)
- http://railscasts.com/episodes/407-activity-feed-from-scratch(从头开始也是一样)
Those solutions dont include all your requirements, but it should give you some ideas.
这些解决方案不包括您的所有要求,但它应该给您一些想法。

