SQL Facebook数据库设计？

Question

提问by Marin

I have always wondered how Facebook designed the friend <-> user relation.

我一直想知道 Facebook 是如何设计朋友 <-> 用户关系的。

I figure the user table is something like this:

我认为用户表是这样的：

user_email PK
user_id PK
password

I figure the table with user's data (sex, age etc connected via user email I would assume).

我用用户的数据（性别、年龄等通过我假设的用户电子邮件连接）计算表格。

How does it connect all the friends to this user?

它如何将所有朋友与该用户联系起来？

Something like this?

像这样的东西？

user_id
friend_id_1
friend_id_2
friend_id_3
friend_id_N

Probably not. Because the number of users is unknown and will expand.

可能不是。因为用户数量未知，而且还会扩大。

Answer 1

回答by TheTXI

Keep a friend table that holds the UserID and then the UserID of the friend (we will call it FriendID). Both columns would be foreign keys back to the Users table.

保留一个朋友表，其中包含用户 ID，然后是朋友的用户 ID（我们将其称为 FriendID）。两列都是返回到用户表的外键。

Somewhat useful example:

有点有用的例子：

Table Name: User
Columns:
    UserID PK
    EmailAddress
    Password
    Gender
    DOB
    Location

TableName: Friends
Columns:
    UserID PK FK
    FriendID PK FK
    (This table features a composite primary key made up of the two foreign 
     keys, both pointing back to the user table. One ID will point to the
     logged in user, the other ID will point to the individual friend
     of that user)

Example Usage:

示例用法：

Table User
--------------
UserID EmailAddress Password Gender DOB      Location
------------------------------------------------------
1      [email protected]  bobbie   M      1/1/2009 New York City
2      [email protected]  jonathan M      2/2/2008 Los Angeles
3      [email protected]  joseph   M      1/2/2007 Pittsburgh

Table Friends
---------------
UserID FriendID
----------------
1      2
1      3
2      3

This will show that Bob is friends with both Jon and Joe and that Jon is also friends with Joe. In this example we will assume that friendship is always two ways, so you would not need a row in the table such as (2,1) or (3,2) because they are already represented in the other direction. For examples where friendship or other relations aren't explicitly two way, you would need to also have those rows to indicate the two-way relationship.

这将表明 Bob 是 Jon 和 Joe 的朋友，并且 Jon 也是 Joe 的朋友。在这个例子中，我们假设友谊总是有两种方式，所以你不需要表中的一行，比如 (2,1) 或 (3,2)，因为它们已经在另一个方向上表示了。对于友谊或其他关系不是明确的双向关系的示例，您还需要使用这些行来指示双向关系。

Answer 2

回答by TheTXI

Have a look at the following database schema, reverse engineered by Anatoly Lubarsky:

查看以下数据库架构，由 Anatoly Lubarsky 逆向工程：

Facebook Schema

脸书架构

Answer 3

回答by burzum

TL;DR:

特尔；博士：

They use a stack architecture with cached graphs for everything above the MySQL bottom of their stack.

他们使用带有缓存图的堆栈架构来处理其堆栈底部 MySQL 之上的所有内容。

Long Answer:

长答案：

I did some research on this myself because I was curious how they handle their huge amount of data and search it in a quick way. I've seen people complaining about custom made social network scripts becoming slow when the user base grows. After I did some benchmarking myself with just 10kusers and 2.5 million friendconnections - not even trying to bother about group permissions and likes and wall posts - it quickly turned out that this approach is flawed. So I've spent some time searching the web on how to do it better and came across this official Facebook article:

我自己对此做了一些研究，因为我很好奇他们如何处理大量数据并快速搜索。我看到人们抱怨自定义社交网络脚本随着用户群的增长而变得缓慢。在我对只有 1 万个用户和250 万个朋友连接进行了一些基准测试之后——甚至没有尝试去关心群组权限、喜欢和墙上的帖子——很快发现这种方法是有缺陷的。所以我花了一些时间在网上搜索如何做得更好，并发现了这篇 Facebook 官方文章：

I reallyrecommend you to watch the presentation of the first link above before continue reading. It's probably the best explanation of how FB works behind the scenes you can find.

我真的建议您在继续阅读之前先观看上面第一个链接的介绍。这可能是您可以找到的关于 FB 如何在幕后工作的最佳解释。

The video and article tells you a few things:

视频和文章告诉你一些事情：

They're using MySQL at the very bottomof their stack
Abovethe SQL DB there is the TAO layer which contains at least two levels of caching and is using graphs to describe the connections.
I could not find anything on what software / DB they actually use for their cached graphs

他们在堆栈的最底层使用 MySQL
在SQL DB之上是 TAO 层，它包含至少两个级别的缓存，并使用图来描述连接。
我找不到关于他们实际用于缓存图形的软件/数据库的任何信息

Let's take a look at this, friend connections are top left:

我们来看看这个，好友连接在左上角：

enter image description here

在此处输入图片说明

Well, this is a graph. :) It doesn't tell you howto build it in SQL, there are several ways to do it but this sitehas a good amount of different approaches. Attention:Consider that a relational DB is what it is: It's thought to store normalised data, not a graph structure. So it won't perform as good as a specialised graph database.

嗯，这是一个图表。:) 它没有告诉你如何在 SQL 中构建它，有几种方法可以做到，但是这个站点有很多不同的方法。注意：考虑一下关系数据库是什么：它被认为是存储规范化数据，而不是图形结构。所以它的性能不如专门的图形数据库。

Also consider that you have to do more complex queries than just friends of friends, for example when you want to filter all locations around a given coordinate that you and your friends of friends like. A graph is the perfect solution here.

还要考虑到您必须执行更复杂的查询，而不仅仅是朋友的朋友，例如，当您想要过滤您和您朋友的朋友喜欢的给定坐标周围的所有位置时。图表是这里的完美解决方案。

I can't tell you how to build it so that it will perform well but it clearly requires some trial and error and benchmarking.

我无法告诉您如何构建它以使其性能良好，但它显然需要一些反复试验和基准测试。

Here is my disappointingtest for justfindings friends of friends:

这是我失望的测试只是朋友的朋友的调查结果：

DB Schema:

数据库架构：

CREATE TABLE IF NOT EXISTS `friends` (
`id` int(11) NOT NULL,
  `user_id` int(11) NOT NULL,
  `friend_id` int(11) NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

Friends of Friends Query:

朋友的朋友查询：

(
        select friend_id
        from friends
        where user_id = 1
    ) union (
        select distinct ff.friend_id
        from
            friends f
            join friends ff on ff.user_id = f.friend_id
        where f.user_id = 1
    )

I really recommend you to create you some sample data with at least 10k user records and each of them having at least 250 friend connections and then run this query. On my machine (i7 4770k, SSD, 16gb RAM) the result was ~0.18 secondsfor that query. Maybe it can be optimized, I'm not a DB genius (suggestions are welcome). However, ifthis scales linear you're already at 1.8 seconds for just 100k users, 18 seconds for 1 million users.

我真的建议您创建一些样本数据，其中包含至少 10k 条用户记录，并且每个记录至少有 250 个好友连接，然后运行此查询。在我的机器（i7 4770k、SSD、16GB RAM）上，该查询的结果是~0.18 秒。也许它可以优化，我不是数据库天才（欢迎提出建议）。但是，如果这是线性扩展，那么对于 100k 用户来说，您已经是 1.8 秒，对于 100 万用户来说是 18 秒。

This might still sound OKish for ~100k users but consider that you just fetched friends of friends and didn't do any more complex query like "display me only posts from friends of friends + do the permission check if I'm allowed or NOT allowed to see some of them + do a sub query to check if I liked any of them". You want to let the DB do the check on if you liked a post already or not or you'll have to do in code. Also consider that this is not the only query you run and that your have more than active user at the same time on a more or less popular site.

对于大约 10 万用户来说，这听起来可能仍然不错，但考虑到您只是获取了朋友的朋友，并没有执行任何更复杂的查询，例如“仅向我显示朋友朋友的帖子+是否允许或不允许我进行权限检查查看其中一些 + 进行子查询以检查我是否喜欢其中任何一个”。您想让数据库检查您是否喜欢某个帖子，或者您必须使用代码进行检查。还要考虑到这不是您运行的唯一查询，并且您在或多或少受欢迎的网站上同时拥有多个活跃用户。

I think my answer answers the question how Facebook designed their friends relationship very well but I'm sorry that I can't tell you how to implement it in a way it will work fast. Implementing a social network is easy but making sure it performs well is clearly not - IMHO.

我想我的回答回答了 Facebook 如何很好地设计他们的朋友关系的问题，但很抱歉我不能告诉你如何以一种快速运行的方式实施它。实施社交网络很容易，但确保其表现良好显然不是 - 恕我直言。

I've started experimenting with OrientDB to do the graph-queries and mapping my edges to the underlying SQL DB. If I ever get it done I'll write an article about it.

我已经开始尝试使用 OrientDB 来进行图形查询并将我的边映射到底层 SQL DB。如果我能完成它，我会写一篇关于它的文章。

Answer 4

回答by belgariontheking

My best bet is that they created a graph structure. The nodes are users and "friendships" are edges.

我最好的选择是他们创建了一个图形结构。节点是用户，“友谊”是边。

Keep one table of users, keep another table of edges. Then you can keep data about the edges, like "day they became friends" and "approved status," etc.

保留一张用户表，保留另一张边表。然后你可以保留关于边缘的数据，比如“他们成为朋友的那天”和“批准状态”等。

Answer 5

回答by Nathan Koop

It's most likely a many to many relationship:

这很可能是多对多关系：

FriendList (table)

好友列表（表格）

user_id -> users.user_id
friend_id -> users.user_id
friendVisibilityLevel

EDIT

编辑

The user table probably doesn't have user_email as a PK, possiblyas a unique key though.

用户表可能没有 user_email 作为 PK，但可能没有作为唯一键。

users (table)

用户（表）

user_id PK
user_email
password

Answer 6

回答by Adrian J. Moreno

Take a look at these articles describing how LinkedIn and Digg are built:

看看这些描述 LinkedIn 和 Digg 是如何构建的文章：

There's also "Big Data: Viewpoints from the Facebook Data Team" that might be helpful:

还有“大数据：来自 Facebook 数据团队的观点”可能会有所帮助：

http://developer.yahoo.net/blogs/theater/archives/2008/01/nextyahoonet_big_data_viewpoints_from_the_fac.html

Also, there's this article that talks about non-relational databases and how they're used by some companies:

此外，还有这篇文章讨论了非关系数据库以及一些公司如何使用它们：

http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php

You'll see that these companies are dealing with data warehouses, partitioned databases, data caching and other higher level concepts than most of us never deal with on a daily basis. Or at least, maybe we don't know that we do.

您会看到这些公司正在处理数据仓库、分区数据库、数据缓存和其他更高级的概念，而我们大多数人每天都不会处理这些概念。或者至少，也许我们不知道我们知道。

There are a lot of links on the first two articles that should give you some more insight.

前两篇文章中有很多链接，可以让您更深入地了解。

UPDATE 10/20/2014

更新 10/20/2014

Murat Demirbaswrote a summary on

穆拉特·德米尔巴斯( Murat Demirbas)写了一篇关于

TAO: Facebook's distributed data store for the social graph (ATC'13)
F4: Facebook's warm BLOB storage system (OSDI'14)

TAO：Facebook 的社交图分布式数据存储 (ATC'13)
F4：Facebook 的暖 BLOB 存储系统 (OSDI'14)

http://muratbuffalo.blogspot.com/2014/10/facebooks-software-architecture.html

HTH

Answer 7

回答by user362541

It's not possible to retrieve data from RDBMS for user friends data for data which cross more than half a billion at a constant time so Facebook implemented this using a hash database (no SQL) and they opensourced the database called Cassandra.

不可能从 RDBMS 中检索用户朋友数据的数据，因为数据在恒定时间内超过 50 亿，因此 Facebook 使用哈希数据库（无 SQL）实现了这一点，并且他们开源了名为 Cassandra 的数据库。

So every user has its own key and the friends details in a queue; to know how cassandra works look at this:

所以每个用户都有自己的密钥和队列中的朋友详细信息；要了解 cassandra 的工作原理，请看这个：

http://prasath.posterous.com/cassandra-55

Answer 8

回答by James Sherwin-Smith

This recent June 2013 post goes into some detail into explaining the transition from relationship databases to objects with associations for some data types.

最近 2013 年 6 月的这篇博文详细解释了从关系数据库到具有某些数据类型关联的对象的转变。

https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-the-graph/10151525983993920

There's a longer paper available at https://www.usenix.org/conference/atc13/tao-facebook's-distributed-data-store-social-graph

https://www.usenix.org/conference/atc13/tao-facebook's-distributed-data-store-social-graph 上有更长的论文

Answer 9

回答by Malfist

You're looking for foreign keys. Basically you can't have an array in a database unless it has it's own table.

您正在寻找外键。基本上你不能在数据库中有一个数组，除非它有它自己的表。

Example schema:

示例架构：

    Users Table
        userID PK
        other data
    Friends Table
        userID   -- FK to users's table representing the user that has a friend.
        friendID -- FK to Users' table representing the user id of the friend

Answer 10

回答by zain

Its a type of graph database: http://components.neo4j.org/neo4j-examples/1.2-SNAPSHOT/social-network.html

它是一种图形数据库：http: //components.neo4j.org/neo4j-examples/1.2-SNAPSHOT/social-network.html

Its not related to Relational databases.

它与关系数据库无关。

Google for graph databases.

谷歌图数据库。

SQL Facebook数据库设计？

提问by Marin

回答by TheTXI

回答by TheTXI

回答by burzum

回答by belgariontheking

回答by Nathan Koop

回答by Adrian J. Moreno

回答by user362541

回答by James Sherwin-Smith

回答by Malfist

Example schema:

示例架构：

回答by zain

相关推荐

最近更新

标签

SQL Facebook数据库设计？

提问by Marin

回答by TheTXI

回答by TheTXI

回答by burzum

回答by belgariontheking

回答by Nathan Koop

回答by Adrian J. Moreno

回答by user362541

回答by James Sherwin-Smith

回答by Malfist

Example schema:

示例架构：

回答by zain

相关推荐

SQL Server，如何将两列合并为一列？

从没有主键的 SQL 表中删除重复记录

SQL 工资高于部门平均水平的员工？

SQL 批量插入 Oracle 数据库：哪个更好：FOR Cursor 循环还是简单的 Select？

相关推荐

最近更新

标签