database 星型设计

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/110032/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 06:57:15  来源:igfitidea点击:

Star-Schema Design

databasedesign-patternsdata-warehousestar-schemadimensional-modeling

提问by S.Lott

Is a Star-Schema design essential to a data warehouse? Or can you do data warehousing with another design pattern?

Star-Schema 设计对于数据仓库是必不可少的吗?或者你可以用另一种设计模式来做数据仓库吗?

回答by ConcernedOfTunbridgeWells

Using star schemasfor a data warehouse system gets you several benefits and in most cases it is appropriate to use them for the top layer. You may also have an operational data store (ODS) - a normalised structure that holds 'current state' and facilitates operations such as data conformation. However there are reasonable situations where this is not desirable. I've had occasion to build systems with and without ODS layers, and had specific reasons for the choice of architecture in each case.

星型模式用于数据仓库系统可以获得多种好处,并且在大多数情况下,将它们用于顶层是合适的。您可能还有一个操作数据存储 (ODS) - 一个规范化的结构,用于保存“当前状态”并促进诸如数据构造之类的操作。然而,在合理的情况下这是不可取的。我有机会构建有和没有 ODS 层的系统,并且在每种情况下都有特定的架构选择原因。

Without going into the subtlties of data warehouse architecture or starting a Kimball vs. Inmon flame war the main benefits of a star schema are:

无需深入研究数据仓库架构的微妙之处,也无需开始 Kimball 与 Inmon 之间的War,星型模式的主要好处是:

  • Most database management systems have facilities in the query optimiser to do 'Star Transformations' that use Bitmap Indexstructures or Index Intersectionfor fast predicate resolution. This means that selection from a star schema can be done without hitting the fact table (which is usually much bigger than the indexes) until the selection is resolved.

  • Partitioninga star schema is relatively straightforward as only the fact table needs to be partitioned (unless you have some biblically large dimensions). Partition eliminationmeans that the query optimiser can ignore patitions that could not possibly participate in the query results, which saves on I/O.

  • Slowly changing dimensionsare much easier to implement on a star schema than a snowflake.

  • The schema is easier to understand and tends to involve less joins than a snowflakeor E-R schema. Your reporting team will love you for this

  • Star schemas are much easier to use and (more importantly) make perform well with ad-hoc query tools such as Business Objectsor Report Builder. As a developer you have very little control over the SQL generated by these tools so you need to give the query optimiser as much help as possible. Star schemas give the query optimiser relatively little opportunity to get it wrong.

  • 大多数数据库管理系统在查询优化器中都有用于执行“星形变换”的工具,这些变换使用位图索引结构或 索引交集来进行快速谓词解析。这意味着可以从星型模式中进行选择,而无需点击事实表(通常比索引大得多),直到选择解决为止。

  • 划分星型模式相对简单,因为只需要对事实表进行分区(除非您有一些符合圣经的大维度)。 分区消除意味着查询优化器可以忽略不可能参与查询结果的分区,从而节省 I/O。

  • 缓慢变化的维度在星型模式上比在雪花上更容易实现。

  • 雪花或 ER 模式相比,该模式更容易理解并且往往涉及更少的连接。您的报告团队会因此而爱您

  • 星型模式更易于使用,并且(更重要的是)使用诸如Business ObjectsReport Builder 之类的即席查询工具可以很好地执行。作为开发人员,您几乎无法控制这些工具生成的 SQL,因此您需要为查询优化器提供尽可能多的帮助。星型模式使查询优化器出错的机会相对较少。

Typically your reporting layer would use star schemas unless you have a specific reason not to. If you have multiple source systems you may want to implement an Operational Data Store with a normalised or snowflake schema to accumulate the data. This is easier because an ODS typically does not do history. Historical state is tracked in star schemas where this is much easier to do than with normalised structures. A normalised or snowflaked Operational Data Store reflects 'current' state and does not hold a historical view over and above any that is inherent in the data.

通常,您的报告层会使用星型模式,除非您有特定的理由不这样做。如果您有多个源系统,您可能希望使用规范化或雪花模式实现操作数据存储来积累数据。这更容易,因为 ODS 通常不记录历史。在星型模式中跟踪历史状态,这比使用规范化结构更容易做到。标准化或雪花状的操作数据存储反映了“当前”状态,并且不包含超越数据中固有的任何历史视图的历史视图。

ODS load processes are concerned with data scrubbing and conforming, which is easier to do with a normalised structure. Once you have clean data in an ODS, dimension and fact loads can track history (changes over time) with generic or relatively simple mechanisms relatively simply; this is much easier to do with a star schema, Many ETL tools (for example) provide built-in facilities for slowly changing dimensions and implementing a generic mechanism is relatively straightforward.

ODS 加载过程与数据清理和一致性有关,使用规范化结构更容易做到这一点。一旦您在 ODS 中拥有干净的数据,维度和事实加载就可以使用通用或相对简单的机制相对简单地跟踪历史记录(随时间的变化);使用星型模式更容易做到这一点,许多 ETL 工具(例如)为缓慢变化的维度提供内置设施,实现通用机制相对简单。

Layering the system in this way providies a separation of responsibilities - business and data cleansing logic is dealt with in the ODS and the star schema loads deal with historical state.

以这种方式分层系统提供了职责分离——业务和数据清理逻辑在 ODS 中处理,星型模式负载处理历史状态。

回答by MOLAP

There is an ongoing debate in the datawarehousing litterature about wherein the datawarehouse-architecture the Star-Schemadesign should be applied.

在数据仓库文献中,关于Star-Schema设计应该应用在数据仓库架构中的什么地方一直存在争论。

In short Kimballadvocates very highly for using only the Star-Schema design in the datawarehouse, while Inmonfirst wants to build an Enterprise Datawarehouse using normalized 3NFdesign and later use the Star-Schema design in the datamarts.

简而言之,Kimball非常提倡在数据仓库中只使用 Star-Schema 设计,而Inmon首先希望使用规范化的 3NF设计构建企业数据仓库,然后在数据集市中使用 Star-Schema 设计。

In addition here to you could also say that Snowflake schema designis another approach.

除了这里你还可以说雪花模式设计是另一种方法。

A fourth design could be the Data Vault Modelingapproach.

第四种设计可能是数据库建模方法。

回答by Mike McAllister

Star schemas are used to enable high speed access to large volumes of data. The high performance is enabled by reducing the amount of joins needed to satsify any query that may be made against the subject area. This is done by allowing data redundancy in dimension tables.

星型模式用于实现对大量数据的高速访问。高性能是通过减少满足可能针对主题区域进行的任何查询所需的连接数量来实现的。这是通过允许维度表中的数据冗余来实现的。

You have to remember that the star schema is a pattern for the top layer for the warehouse. All models also involve staging schemas at the bottom of the warehouse stack, and some also include a persistant transformed merged staging area where all source systems are merged into a 3NF modelled schema. The various subject areas sit above this.

您必须记住,星型模式是仓库顶层的模式。所有模型还涉及仓库堆栈底部的暂存模式,有些还包括持久转换合并暂存区,其中所有源系统都合并到 3NF 建模模式中。各种学科领域都在此之上。

Alternatives to star schemas at the top level include a variation, which is a snowflake schema. A new method that may bear out some investigation as well is Data Vault Modellingproposed by Dan Linstedt.

顶级星型模式的替代方案包括一个变体,即雪花模式。Dan Linstedt 提出的Data Vault Modeling是一种可能也经得起调查的新方法。

回答by Mike

The thing about star schemas is they are a natural model for the kinds of things most people want to do with a data warehouse. For instance it is easy to produce reports with different levels of granularity (month or day or year for example). It is also efficient to insert typical business data into a star schema, again a common and important feature of a data warehouse.

关于星型模式的事情是它们是大多数人想要用数据仓库做的事情的自然模型。例如,很容易生成具有不同粒度级别(例如,月、日或年)的报告。将典型的业务数据插入星型模式也很有效,这也是数据仓库的一个常见且重要的特性。

You certainly can use any kind of database you want but unless you know your business domain very well it is likely that your reports will not run as efficiently as they could if you had used a star schema.

您当然可以使用您想要的任何类型的数据库,但除非您非常了解您的业务领域,否则您的报告可能无法像使用星型模式时那样高效地运行。

回答by Josh McAdams

Star schemas are a natural fit for the last layer of a data warehouse. How you get there is another question. As far as I know, there are two big camps, those of Bill Inmon and Ralph Kimball. You might want to look at the theories of these two guys if/when you decide to go with a star.

星型模式自然适合数据仓库的最后一层。你如何到达那里是另一个问题。据我所知,有两大阵营,比尔·英蒙和拉尔夫·金博尔。如果/当您决定选择明星时,您可能想看看这两个家伙的理论。

Also, some reporting tools really like the star schema setup. If you are locked into a specific reporting tool, that might drive what the reporting mart looks like in your warehouse.

此外,一些报告工具非常喜欢星型模式设置。如果您被锁定在特定的报告工具中,那可能会影响报告集市在您的仓库中的样子。

回答by csaba

Star schema is a logical data model for relational databases that fits the regular data warehousing needs; if the relational environment is given, a star or a snowflake schema will be a good design pattern, hard-wired in lots of DW design methodologies.

星型模式是关系型数据库的逻辑数据模型,适合常规的数据仓库需求;如果给定了关系环境,星形或雪花模式将是一个很好的设计模式,在许多 DW 设计方法中都是硬连线的。

There are however other than relational database engines too, and they can be used for efficient data warehousing. Multidimensional storage engines might be very fast for OLAP tasks (TM1 eg.); we can not apply star schema design in this case. Other examples requiring special logical models include XML databases or column-oriented databases (eg. the experimental C-store)).

然而,除了关系数据库引擎之外,还有其他引擎,它们可用于高效的数据仓库。对于 OLAP 任务(例如 TM1),多维存储引擎可能非常快;在这种情况下我们不能应用星型模式设计。其他需要特殊逻辑模型的示例包括 XML 数据库或面向列的数据库(例如实验性C-store))。

回答by SquareCog

It's possible to do without. However, you will make life hard for yourself -- your organization will want to use standard tools that live on top of DWs, and those tools will expect a star schema -- a lot of effort will be spent fitting a square peg in a round hole.

没有它是可能的。但是,您将很难为自己而努力——您的组织将希望使用位于 DW 之上的标准工具,而这些工具将期望使用星型模式——将花费大量精力将方钉安装到圆形中洞。

A lot of database-level optimizations assume that you have a star schema; you will spend a lot of time optimizing and restructuring to get the DB to do "the right thing" with your not-quite-star layout.

许多数据库级优化都假设您有一个星型模式;您将花费大量时间进行优化和重组,以使 DB 用您的不太星级的布局做“正确的事情”。

Make sure that the pros outweigh the cons..

确保优点大于缺点。

(Does it sound like I've been there before?)

(听起来我以前去过那里吗?)

-D

-D

回答by Steve

There are three problems we need to solve.

我们需要解决三个问题。

1) How to get the data out of the operational source system without putting undue pressure on them by joining tables within and between them, cleaning data as we extract, creating derivations etc.

1)如何通过在它们内部和之间连接表,在我们提取时清理数据,创建派生等,从操作源系统中获取数据,而不会对其施加过度压力。

2) How to merge data from disparate sources - some legacy, some file based, from different departments into an integral, accurate, efficiently stored whole that models the business, and does not reflect the structures of the source systems. Remember, systems change / are replaced relatively quickly, but the basic model of the business changes slowly.

2) 如何将来自不同来源的数据——一些遗留的、一些基于文件的、来自不同部门的数据合并成一个完整的、准确的、有效存储的、对业务建模的整体,并且不反映源系统的结构。请记住,系统变化/更换相对较快,但业务的基本模型变化缓慢。

3) How to structure the data to meet specific analytical and reporting requirements for particular people/departments in the business as quickly and accurately as possible.

3) 如何尽可能快速准确地构建数据以满足特定业务中特定人员/部门的特定分析和报告要求。

The solution to these three very different problems require different architectural layers to solve them

这三个截然不同的问题的解决方案需要不同的架构层来解决它们

Staging Layer We replicate the structures of the sources, but only changed data from the sources are loaded each night. once the data is taken from the staging layer into the next layer, the data is dropped. Queries are single table queries with a simple data_time filter. Very little effect on the source.

暂存层我们复制源的结构,但每晚只加载来自源的更改数据。一旦数据从暂存层进入下一层,数据就会被丢弃。查询是带有简单 data_time 过滤器的单表查询。对源的影响很小。

Enterprise Layer This is a business oriented 3rd normal form database. Data is extracted (and afterward dropped) from the staging layer into the enterprise layer, where it is cleaned, integrated and normalised.

企业层 这是一个面向业务的第三范式数据库。数据从暂存层提取(然后删除)到企业层,在那里进行清理、集成和规范化。

Presentation (Star Schema) Layer Here, we model dimensionally to meet specific requirements. Data is deliberately de-normalise to reduce the number of joins. Hierarchies that may occupy several tables in the Enterprise Layer are collapsed into a single dimension tables, and multiple transactional tables may be merged into single fact tables.

表示(星型模式)层 在这里,我们进行维度建模以满足特定要求。数据被故意去规范化以减少连接的数量。企业层中可能占用多个表的层次结构被折叠成单个维度表,并且多个事务表可以合并成单个事实表。

You always face these three problems. If you choose to do away with the enterprise layer, you still have to solve the second problem, but you have to do it in the star schema layer, and in my view, this is the wrong place to do it.

你总是面临这三个问题。如果你选择取消企业层,你仍然要解决第二个问题,但你必须在星型模式层做,在我看来,这是错误的地方。