database 数据库分区 - 水平与垂直 - 规范化和行拆分之间的区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20388923/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Database partitioning - Horizontal vs Vertical - Difference between Normalization and Row Splitting?
提问by dukable
I am trying to grasp the different concepts of Database Partitioningand this is what I understood of it:
我试图掌握数据库分区的不同概念,这就是我对它的理解:
Horizontal Partitioning/Sharding: Splitting a table into different table that will contain a subset of the rows that were in the initial table (an example that I have seen a lot if splitting a Users table by Continent, like a sub table for North America, another one for Europe, etc...). Each partition being in a different physical location(understand 'machine'). As I understood it, Horizontal Partitioning and Sharding are the exact same thing(?).
水平分区/分片:将表拆分为不同的表,其中将包含初始表中行的子集(如果按大陆拆分用户表,我见过很多示例,例如北美的子表,另一个用于欧洲等......)。每个分区位于不同的物理位置(理解“机器”)。据我了解,水平分区和分片是完全一样的(?)。
Vertical Partitioning: From what I understood (http://technet.microsoft.com/en-us/library/ms178148%28v=sql.105%29.aspx), there are 2 sorts of Vertical Partitioning:
垂直分区:据我了解(http://technet.microsoft.com/en-us/library/ms178148%28v=sql.105%29.aspx),有两种垂直分区:
Normalization(which consists of removing redundancies from a the database by splitting tables and linking them with a foreign key).
Row Splitting, here is what I don't understand, what is the difference between Normalizationand Row Splitting? In what those 2 techniques differ from each other?
规范化(包括通过拆分表并将它们与外键链接来从数据库中删除冗余)。
Row Splitting,这是我不明白的,Normalization和Row Splitting 有什么区别?这两种技术有何不同?
I have also read in this post (Difference between scaling horizontally and vertically for databases) that the difference between Horizontal Partitioning and Vertical Partitioning is that in the first you scale by adding more machines, while in the second one you scale by adding more power (CPU, RAM) to your existing machine, is that a correct definition? I thought that the coredifference between those 2 techniques resides in the way you split your tables.
我还在这篇文章(数据库水平和垂直扩展之间的差异)中读到,水平分区和垂直分区之间的区别在于,在第一个中,您通过添加更多机器进行扩展,而在第二个中,您通过添加更多功能进行扩展( CPU,RAM)到您现有的机器,这是一个正确的定义吗?我认为这两种技术之间的核心区别在于您拆分表格的方式。
I am sorry for the load of questions but I am a bit confused as a lot of different websites that I have came across say different things.
我很抱歉有这么多问题,但我有点困惑,因为我遇到的许多不同的网站都说了不同的事情。
Any help clarifying would be greatly appreciated. Any link to a clear and simple demonstration with a few tables would also be very helpful.
任何帮助澄清将不胜感激。任何指向带有几张表格的清晰简单演示的链接也将非常有帮助。
回答by Fabian
Partitioning is a rather general concept and can be applied in many contexts. When it considers the partitioning of relational data, it usually refers to decomposing your tables either row-wise (horizontally) or column-wise (vertically).
分区是一个相当笼统的概念,可以在许多情况下应用。当它考虑关系数据的分区时,它通常是指按行(水平)或按列(垂直)分解表。
Vertical partitioning, aka row splitting, uses the same splitting techniques as database normalization, but ususally the term (vertical / horizontal) data partitioning refers to a physical optimizationwhereas normalization is an optimization on the conceptuallevel.
垂直分区,又名行拆分,使用与数据库规范化相同的拆分技术,但通常术语(垂直/水平)数据分区是指物理优化,而规范化是概念级别的优化。
Since you ask for a simple demonstration - assume you have a table like this:
由于您要求进行简单的演示 - 假设您有一张这样的表格:
create table data (
id integer primary key,
status char(1) not null,
data1 varchar2(10) not null,
data2 varchar2(10) not null);
One way to partition datavertically: Split it as follows:
一种data垂直分区的方法:拆分如下:
create table data_main (
id integer primary key,
status char(1) not null,
data1 varchar2(10) not null );
create table data_rarely_used (
id integer primary key,
data2 varchar2(10) not null,
foreign key (id) references data_main (id) );
This kind of partitioning can be applied, for example, when you rarely need column data2 in your queries. Partition data_main will take less space, hence full table scans will be faster and it is more likely that it fits into the DBMS' page cache. The downside: When you have to query all columns of data, you obivously have to join the tables, which will be more expensive that querying the original table.
例如,当您在查询中很少需要列 data2 时,可以应用这种分区。分区 data_main 将占用更少的空间,因此全表扫描会更快,并且更有可能适合 DBMS 的页面缓存。缺点:当您必须查询 的所有列时data,您显然必须连接表,这将比查询原始表更昂贵。
Notice you are splitting the columns in the same way as you would when you normalize tables. However, in this case datacould already be normalized to 3NF (and even BCNF and 4NF), but you decide to further split it for the reason of physical optimization.
请注意,您正在以与规范化表时相同的方式拆分列。但是,在这种情况下data,已经可以归一化为 3NF(甚至 BCNF 和 4NF),但是您出于物理优化的原因决定进一步拆分它。
One way to partition datahorizontally, using Oracle syntax:
使用 Oracle 语法进行data水平分区的一种方法:
create table data (
id integer primary key,
status char(1),
data1 varchar2(10),
data2 varchar2(10) )
partition by list (status) (
partition active_data values ( 'A' ),
partition other_data values(default)
);
This would tell the DBMS to internally store the table datain two segments (like two tables), depending on the value of the column status. This way of partitioning datacan be applied, for example, when you usually query only rows of one partition, e.g., the status 'A' rows (let's call them active rows). Like before, full scans will be faster (particularly if there are only few active rows), the active rows (and the other rows resp.) are stored contiguously (they won't be scattered around pages that they share with rows of a different status value, and it is more likely that the active rows will be in the page cache.
这将告诉 DBMS 将表内部存储data在两个段中(如两个表),具体取决于列的值status。data可以应用这种分区方式,例如,当您通常只查询一个分区的行时,例如状态为“A”的行(我们称之为活动行)。像以前一样,完整扫描会更快(特别是如果只有很少的活动行),活动行(以及其他行)是连续存储的(它们不会分散在与不同行共享的页面周围) status 值,并且活动行更有可能在页面缓存中。
回答by sameer sukumaran
Horizontal Partitioning in data base
数据库中的水平分区
Keeping all the fields EG:Table Employeeshas
保持所有的字段EG:表Employees有
- id,
- name,
- Geographical location ,
- email,
- designation,
- phone
- ID,
- 姓名,
- 地理位置 ,
- 电子邮件,
- 指定,
- 电话
EG:1.Keeping all the fields and distributing records in multiple machines.say id= 1-100000 or 100000-200000 records in one machine each and distributing over multiple machines.
EG:1.在多台机器上保留所有字段并分布记录。例如,id= 1-100000 或 100000-200000 条记录在一台机器上,并分布在多台机器上。
EG:2.Keeping separate databases for Regions EG: Asia Pacific,North America
EG:2. 为区域 EG 保留单独的数据库:亚太地区,北美
Key:Picking set of rows based on a criteria
键:根据条件选择行集
Vertical Partitioning in data base
数据库中的垂直分区
It is similar to Normalization where the same table is divided in to multiple tables and used with joins if required.
它类似于规范化,其中同一个表被分成多个表并在需要时与连接一起使用。
EG:
id, name, designationis put in one table andphone, emailwhich may not be frequently accessed are put in another.
EG:
id, name,designation放在一张表中phone,email可能不经常访问的放在另一张表中。
Key:Picking set of columns based on a criteria.
键:根据条件选择列集。
- Horizontal/Vertical Scaling is different from partitioning
- 水平/垂直缩放与分区不同
Horizontal Scaling:
水平缩放:
is about adding more machines toenable improved responsiveness and availability of any system including database.The idea is to distribute the work load to multiple machines.
是关于添加更多机器以提高包括数据库在内的任何系统的响应能力和可用性。这个想法是将工作负载分配给多台机器。
Vertical Scaling:
垂直缩放:
is about adding more capability in the form of CPU,Memory to existing machine or machines to enable improved responsiveness and availability of any system including database.In a virtual machine set up it can be configured virtually instead of adding real physical machines.
是关于以 CPU、内存的形式向现有机器或机器添加更多功能,以提高包括数据库在内的任何系统的响应能力和可用性。在虚拟机设置中,它可以进行虚拟配置,而不是添加真正的物理机器。
Sameer Sukumaran
萨米尔·苏库马兰
回答by Bhagwati Malav
The problems with single database arises when it starts getting huge. So it is required to partition it, to reduce search space, so that it can execute required actions faster.There are various partition strategies available eg: horizontal partitioning, vertical partitioning, hash based partitioning, lookup based partitioning. Horizontal, vertical scaling is different concept compare to these strategies.
当单个数据库开始变得庞大时,就会出现问题。因此需要对其进行分区,以减少搜索空间,以便更快地执行所需的操作。有多种分区策略可用,例如:水平分区、垂直分区、基于哈希的分区、基于查找的分区。与这些策略相比,水平、垂直扩展是不同的概念。
Horizontal partitioning: It splits given table/collection into multiple tables/collections based on some key information which can help in getting right table as horizontal partitioning will have multiple tables on different nodes/machines. eg: region wise users information.
Vertical partitioning: It divide columns into multiple parts as mentioned in one of the above answers eg: columns related to user info, likes, comments, friends etc in social networking application.
Hash based partitioning: It uses hash function to decide table/node, and take key elements as input in generating hash. If we change number of tables, it requires re arrangement of data which is costly. So there is a problem when you want to add more table/node.
Lookup based partitioning: It uses a lookup table which helps in redirecting to different tables/node base on given input fields. We can easily add new table/node in this approach.
水平分区:它根据一些关键信息将给定的表/集合拆分为多个表/集合,这有助于获得正确的表,因为水平分区将在不同的节点/机器上有多个表。例如:区域明智的用户信息。
垂直分区:将列分成多个部分,如上述答案之一所述,例如:社交网络应用程序中与用户信息、喜欢、评论、朋友等相关的列。
基于哈希的分区:它使用哈希函数来决定表/节点,并在生成哈希时将关键元素作为输入。如果我们更改表的数量,则需要重新安排数据,成本很高。所以当你想添加更多的表/节点时就会出现问题。
基于查找的分区:它使用一个查找表,有助于根据给定的输入字段重定向到不同的表/节点。我们可以在这种方法中轻松添加新表/节点。
Horizontal scaling vs vertical scaling: When we design any application, we need to think of scaling as well. How are we going to handle huge amount of traffic in future? We need to think in terms of memory consumption, latency, cpu usage, fault tolerance, resiliency. Vertical scaling adds more resources eg: cpu, memory to single machine so that it can handle the in coming traffic. But there are limitation with this approach, you can't add more resource than certain limit. Horizontal scaling allow in coming traffic to distribute across multiple nodes. It need to have load balancer at front which basically handle the traffic, and navigate traffic to any one node. Horizontal scaling allow you to add enough number of servers, but you would also need these many nodes.
水平缩放 vs 垂直缩放:当我们设计任何应用程序时,我们也需要考虑缩放。未来我们将如何处理大量的流量?我们需要从内存消耗、延迟、cpu 使用、容错、弹性等方面考虑。垂直扩展为单个机器增加了更多资源,例如:cpu、内存,以便它可以处理即将到来的流量。但是这种方法有局限性,您不能添加超过一定限制的资源。水平扩展允许传入的流量分布在多个节点上。它需要在前面有负载均衡器,它基本上处理流量,并将流量导航到任何一个节点。水平扩展允许您添加足够数量的服务器,但您也需要这么多节点。
回答by hummer
The difference between Normalization and splitting lies in the purpose of doing so.
归一化和分裂的区别在于这样做的目的。
The main purpose of Normalization is to remove redundant data Where as The purpose of Row splitting is to separate less required data.
Normalization 的主要目的是去除冗余数据,而 Row splitting 的目的是分离不需要的数据。
eg:- Suppose you have a table All_Details with columns- id , Emp_name, Emp_address, Emp_phNo ,Emp_other_data, Company_Name , Company_Address , Company_revenue.
例如:- 假设您有一个表 All_Details 与列 - id,Emp_name,Emp_address,Emp_phNo,Emp_other_data,Company_Name,Company_Address,Company_revenue。
Now if you want to normalize the table you would create two new table Employee_Details and Company_Details and keep a foreign key of company_id in table Employee_Details. this way redundant company data will be removed .
现在,如果您想规范化该表,您将创建两个新表 Employee_Details 和 Company_Details,并在表 Employee_Details 中保留 company_id 的外键。这样多余的公司数据将被删除。
Now lets talk about row splitting. Say even after normalization you are only accessing employee_name and emp_phNo but you are not accessing emp_address and emp_other_data so frequently. So to improve performance you split the Employee_Details table into two table . table1 containing the frequently needed data( employee_name and emp_phNo ) and table2 containing the less frequently needed data( Emp_address, Emp_other_data) . Both table will have same unique_key column so that you can recreate any row of table Employee_Details with unique_key. This can improve your system performance drasticaly.
现在让我们谈谈行拆分。假设即使在规范化之后,您也只访问employee_name 和emp_phNo,但您并没有如此频繁地访问emp_address 和emp_other_data。因此,为了提高性能,您将 Employee_Details 表拆分为两个表。table1 包含常用数据(employee_name 和 emp_phNo),table2 包含不太常用的数据(Emp_address, Emp_other_data)。两个表都将具有相同的 unique_key 列,以便您可以使用 unique_key 重新创建表 Employee_Details 的任何行。这可以极大地提高您的系统性能。

