SQL 数据库设计最佳实践(地址)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7639637/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL Database Design Best Practice (Addresses)
提问by Vince Fedorchak
Of course I realize that there's no one "right way" to design a SQL database, but I wanted to get some opinions on what is better or worse in my particular scenario.
当然,我意识到没有一种“正确的方法”可以设计 SQL 数据库,但我想就我的特定场景中更好或更坏的情况获得一些意见。
Currently, I'm designing an order entry module (Windows .NET 4.0 application with SQL Server 2008) and I'm torn between two design decisions when it comes to data that can be applied in more than one spot. In this question I'll refer specifically to Addresses.
目前,我正在设计一个订单输入模块(Windows .NET 4.0 应用程序和 SQL Server 2008),当涉及到可以在多个地方应用的数据时,我在两个设计决策之间左右为难。在这个问题中,我将专门提到地址。
Addresses can be used by a variety of objects (orders, customers, employees, shipments, etc..) and they almost always contain the same data (Address1/2/3, City, State, Postal Code, Country, etc). I was originally going to include each of these fields as a column in each of the related tables (e.g. Orders will contain Address1/2/3, City, State, etc.. and Customers will also contain this same column layout). But a part of me wants to apply DRY/Normalization principles to this scenario, i.e. have a table called "Addresses" which is referenced via Foreign Key in the appropriate table.
地址可以被各种对象(订单、客户、员工、发货等)使用,并且它们几乎总是包含相同的数据(地址 1/2/3、城市、州、邮政编码、国家等)。我最初打算将这些字段中的每一个都作为一个列包含在每个相关表中(例如,订单将包含地址 1/2/3、城市、州等。客户也将包含相同的列布局)。但是我的一部分人希望将 DRY/标准化原则应用于这种情况,即有一个名为“地址”的表,该表通过相应表中的外键进行引用。
CREATE TABLE DB.dbo.Addresses
(
Id INT
NOT NULL
IDENTITY(1, 1)
PRIMARY KEY
CHECK (Id > 0),
Address1 VARCHAR(120)
NOT NULL,
Address2 VARCHAR(120),
Address3 VARCHAR(120),
City VARCHAR(100)
NOT NULL,
State CHAR(2)
NOT NULL,
Country CHAR(2)
NOT NULL,
PostalCode VARCHAR(16)
NOT NULL
)
CREATE TABLE DB.dbo.Orders
(
Id INT
NOT NULL
IDENTITY(1000, 1)
PRIMARY KEY
CHECK (Id > 1000),
Address INT
CONSTRAINT fk_Orders_Address
FOREIGN KEY REFERENCES Addresses(Id)
CHECK (Address > 0)
NOT NULL,
-- other columns....
)
CREATE TABLE DB.dbo.Customers
(
Id INT
NOT NULL
IDENTITY(1000, 1)
PRIMARY KEY
CHECK (Id > 1000),
Address INT
CONSTRAINT fk_Customers_Address
FOREIGN KEY REFERENCES Addresses(Id)
CHECK (Address > 0)
NOT NULL,
-- other columns....
)
From a design standpoint I like this approach because it creates a standard address format that is easily changeable, i.e. if I ever needed to add Address4 I would just add it in one place rather than to every table. However, I can see the number of JOINs required to build queries might get a little insane.
从设计的角度来看,我喜欢这种方法,因为它创建了一种易于更改的标准地址格式,即如果我需要添加 Address4,我只会将它添加到一个地方而不是每个表。但是,我可以看到构建查询所需的 JOIN 数量可能有点疯狂。
I guess I'm just wondering if any enterprise-level SQL architects out there have ever used this approach successfully, or if the number of JOINs that this creates would create a performance issue?
我想我只是想知道是否有任何企业级 SQL 架构师曾经成功地使用过这种方法,或者这创建的 JOIN 数量是否会产生性能问题?
采纳答案by Joe Stefanelli
You're on the right track by breaking address out into its own table. I'd add a couple of additional suggestions.
通过将地址分解到自己的表中,您走上了正确的轨道。我会添加一些额外的建议。
Consider taking the Address FK columns out of the Customers/Orders tables and creating junction tables instead. In other words, treat Customers/Addresses and Orders/Addresses as many-to-many relationships in your design now so you can easily support multiple addresses in the future. Yes, this means introducing more tables and joins, but the flexibility you gain is well worth the effort.
Consider creating lookup tables for city, state and country entities. The city/state/country columns of the address table then consist of FKs pointing to these lookup tables. This allows you to guarantee consistent spellings across all addresses and gives you a place to store additional metadata (e.g., city population) if needed in the future.
考虑从客户/订单表中取出地址 FK 列并改为创建连接表。换句话说,现在在您的设计中将客户/地址和订单/地址视为多对多关系,以便您将来可以轻松支持多个地址。是的,这意味着引入更多的表和联接,但您获得的灵活性非常值得付出努力。
考虑为城市、州和国家实体创建查找表。地址表的城市/州/国家列则由指向这些查找表的 FK 组成。这使您可以保证所有地址的拼写一致,并在将来需要时为您提供一个存储额外元数据(例如,城市人口)的地方。
回答by Mike Sherrill 'Cat Recall'
I just have some cautions. For each of these, there's more than one way to fix the problem.
我只是有一些警告。对于其中的每一个,解决问题的方法不止一种。
First, normalization doesn't mean "replace text with an id number".
首先,规范化并不意味着“用 id 号替换文本”。
Second, you don't have a key. I know, you have a column declared "PRIMARY KEY", but that's not enough.
其次,你没有钥匙。我知道,您有一列声明为“PRIMARY KEY”,但这还不够。
insert into Addresses
(Address1, Address2, Address3, City, State, Country, PostalCode)
values
('President Obama', '1600 Pennsylvania Avenue NW', NULL, 'Washington', 'DC', 'US', '20500'),
('President Obama', '1600 Pennsylvania Avenue NW', NULL, 'Washington', 'DC', 'US', '20500'),
('President Obama', '1600 Pennsylvania Avenue NW', NULL, 'Washington', 'DC', 'US', '20500'),
('President Obama', '1600 Pennsylvania Avenue NW', NULL, 'Washington', 'DC', 'US', '20500');
select * from Addresses;
1;President Obama;1600 Pennsylvania Avenue NW;;Washington;DC;US;20500
2;President Obama;1600 Pennsylvania Avenue NW;;Washington;DC;US;20500
3;President Obama;1600 Pennsylvania Avenue NW;;Washington;DC;US;20500
4;President Obama;1600 Pennsylvania Avenue NW;;Washington;DC;US;20500
In the absence of any other constraints, your "primary key" identifies a row; it doesn't identify an address. Identifying a row is usually not good enough.
在没有任何其他约束的情况下,您的“主键”标识一行;它不识别地址。识别一行通常不够好。
Third, "Address1", "Address2", and "Address3" aren't attributes of addresses. They're attributes of mailing labels. (Lines on a mailing label.) That distinction might not be important to you. It's reallyimportant to me.
第三,“Address1”、“Address2”和“Address3”不是地址的属性。它们是邮寄标签的属性。(邮寄标签上的线条。)这种区别对您来说可能并不重要。这对我来说真的很重要。
Fourth, addresses have a lifetime. Between birth and death, they sometimes change. They change when streets get re-routed, buildings get divided, buildings get undivided, and sometimes (I'm pretty sure) when a city employee has a pint too many. Natural disasters can eliminate whole communities. Sometimes buildings get renumbered. In our database, which is tinycompared to most, about 1% per year change like that.
第四,地址有生命周期。在生与死之间,它们有时会发生变化。当街道被改道、建筑物被分割、建筑物未被分割,有时(我很确定)当城市雇员喝得太多时,它们就会改变。自然灾害可以消灭整个社区。有时建筑物会重新编号。在我们的数据库中,与大多数数据库相比,它很小,每年大约有 1% 的变化。
When an address dies, you have to do two things.
当一个地址失效时,你必须做两件事。
- Make sure nobody uses that address to mail, ship, or whatever.
- Make sure its death doesn't affect historical data.
- 确保没有人使用该地址来邮寄、运送或其他任何东西。
- 确保它的死亡不会影响历史数据。
When an address itself changes, you have to do two things.
当地址本身发生变化时,您必须做两件事。
- Some data must reflect that change. Make sure it does.
- Some data must notreflect that change. Make sure it doesn't.
- 一些数据必须反映这种变化。确保它确实如此。
- 某些数据不得反映这种变化。确保它没有。
Fifth, DRY doesn't apply to foreign keys. Their whole purpose is to be repeated. The only question is how wide a key? An id number is narrow, but requires a join. (10 id numbers might require 10 joins.) An address is wide, but requires no joins. (I'm talking here about a proper address, not a mailing label.)
第五,DRY 不适用于外键。他们的全部目的是要重复。唯一的问题是一个键有多宽?id 号很窄,但需要连接。(10 个 ID 号可能需要 10 个连接。)地址很宽,但不需要连接。(我在这里谈论的是正确的地址,而不是邮寄标签。)
That's all I can think of off the top of my head.
这就是我能想到的一切。
回答by HLGEM
I think there is a problem you are not aware of and that is that some of this data is time sensitive. You do not want your records to show you shipped an order to 35 State St, Chicago Il, when you actually sent it to 10 King Street, Martinsburg WV but the customer moved two years after the order was shipped. So yes, build an address table to get the address at that moment in time as long as any change to the address for someone like a customer results in a new addressid not in changing the current address which would break the history on an order.
我认为有一个你没有意识到的问题,那就是其中一些数据是时间敏感的。您不希望您的记录显示您将订单运送到 35 State St, Chicago Il,当您实际将订单发送到 10 King Street, Martinsburg WV 但客户在订单发送两年后搬家。所以是的,建立一个地址表以获取当时的地址,只要对像客户这样的人的地址进行任何更改都会导致新的 addressid 而不是更改当前地址,这会破坏订单的历史记录。
回答by Nathan Hughes
You would want the addresses to be in a separate table only if they were entities in their own right. Entities have identity (meaning it matters if two objects pointed to the same address or to different ones), and they have their own lifecycle apart from other entities. If this was the case with your domain, I think it would be totally apparent and you wouldn't have a need to ask this question.
只有当地址本身是实体时,您才希望这些地址位于单独的表中。实体具有身份(这意味着两个对象指向同一个地址还是不同的地址很重要),并且与其他实体不同,它们有自己的生命周期。如果您的域就是这种情况,我认为这将是完全显而易见的,您就不需要问这个问题了。
Cade's answerexplains the mutability of addresses, something like a shipping address is part of an order and shouldn't be able to change out from under the order it belongs to. This shows that the shipping address doesn't have its own lifecycle. Handling it as if it was a separate entity can only lead to more opportunities for error.
Cade 的回答解释了地址的可变性,例如送货地址是订单的一部分,不应该在它所属的订单下更改。这表明送货地址没有自己的生命周期。把它当作一个单独的实体来处理只会导致更多的出错机会。
"Normalization" specifically refers to removing redundancies from data so you don't have the same item represented in different places. Here the only redundancy is in the DDL, it's not in the data, so "normalization" is not relevant here. (JPA has the concept of embedded classes that can address the redundancy).
“标准化”特指从数据中删除冗余,这样您就不会在不同的地方表示相同的项目。这里唯一的冗余是在 DDL 中,它不在数据中,所以“规范化”在这里无关紧要。(JPA 具有可以解决冗余问题的嵌入式类的概念)。
TLDR: Use a separate table if the address is truly an Entity, with its own distinct identity and its own lifecycle. Otherwise don't.
TLDR:如果地址确实是一个实体,具有自己独特的身份和生命周期,则使用单独的表。否则不要。
回答by Erwin Brandstetter
What you have to answer for yourself is the question whether the same addressin everyday language is actually the same addressin your database. If somebody "changes his address" (colloquially), he really links himself to another address. The address per se only changes when a street is renamed, a zip-code reform takes place or a nuke hits. And those are rare events (hopefully for the most part). There goes your main profit: change in one place for multiple rows (of multiple tables).
你必须自己回答问题是什么是否相同的地址在日常语言实际上是同一个地址在数据库中。如果有人“更改了他的地址”(通俗地说),他实际上将自己链接到另一个地址。地址本身只会在街道更名、邮政编码改革或核弹袭击时发生变化。这些都是罕见的事件(希望在大多数情况下)。您的主要利润是:在一个地方更改多行(多个表)。
If you should actually change an address for that in your model - in the sense of an UPDATE
on table address - that may or may not work for other rows that link to it. Also, in my experience, even the exact same address has to look different for different purposes. Understand the semantic differences and you will arrive at the right model that represents your real world best.
如果您实际上应该更改模型中的UPDATE
地址 - 在表地址的意义上 - 这可能适用于链接到它的其他行,也可能不起作用。此外,根据我的经验,即使是完全相同的地址也必须因不同的目的而看起来不同。了解语义差异,您将获得最能代表您的现实世界的正确模型。
I have a number of databases where I use a common table of streets(which uses a table of cities (which uses a table of countries, ...)). In combination with a street number think of it as geocodes (lat/lon), not "street names". Addresses are notshared among different tables (or rows). Changes to street names and zip codes cascade, other changes don't.
我有许多数据库,其中我使用了一个公共街道表(它使用了一个城市表(它使用了一个国家表,......))。结合街道号码将其视为地理编码(纬度/经度),而不是“街道名称”。地址在不同的表(或行)之间不共享。街道名称和邮政编码的更改级联,其他更改不会。
回答by Cade Roux
It's fine to have a split out addresses table.
有一个拆分的地址表很好。
However, you have to avoid the temptation of allowing multiple rows to refer to the same address without an appropriate system for managing options for the user to decide whether and how changing an address splits out a row for the new address change, i.e. You have the same address for billing and ship-to. Then a user says their address is changing. To start with, old orders might (should?) need their ship-to addresses retained, so you can't change it in-place. But the user might also need to say this address I'm changing is only going to change the ship-to.
但是,您必须避免在没有适当的系统来管理选项的情况下允许多行引用同一地址的诱惑,以便用户决定更改地址是否以及如何为新地址更改拆分一行,即您拥有账单地址和收货地址相同。然后用户说他们的地址正在改变。首先,旧订单可能(应该?)需要保留其送货地址,因此您无法就地更改它。但是用户可能还需要说我要更改的这个地址只会更改收货地址。
回答by dougajmcdonald
You would normally normalise the data as far as possible, so use the table 'Addresses'.
您通常会尽可能地规范化数据,因此请使用“地址”表。
You can use views to de-normalise the data afterwards which use indexes and should give a method to access data with easy references, whilst leaving the underlying structure normalised fully.
您可以使用视图对数据进行反规范化,然后使用索引,并应该提供一种方法来访问具有简单引用的数据,同时使底层结构完全规范化。
The number of joins shouldn't be a major issue, index based joins aren't too much of an overhead.
连接的数量不应该是一个主要问题,基于索引的连接不会造成太大的开销。
回答by Anant Bandewar
One should maintain some master tables for City, State and Country. This way one can avoid the different spellings for these entities which might end up with mapping same city with some different state/country.
应该为城市、州和国家维护一些主表。通过这种方式,人们可以避免这些实体的不同拼写,这可能最终导致将同一城市映射到不同的州/国家。
One can simply map the CityId
in the address table as foreign key as shown below, instead of having all the three fields separately (City, State and Country) as plain text in address table itself.
可以简单地CityId
将地址表中的映射为外键,如下所示,而不是将所有三个字段(城市、州和国家)分别作为地址表本身的纯文本。
Address: {
CityId
// With other fields
}
City: {
CityId
StateId
// Other fields
}
State: {
StateId
CountryId
// Other fields
}
Country: {
CountryId
// Other fields
}
If one maintains all the three ids (CityId
, StateId
and CountryId
) in address table, at the end you have to make joins against those tables. Hence my suggestion would be to have only CityId
and then retrieve rest of the required information though joins with above table structure.
如果在地址表中维护所有三个 id(CityId
,StateId
和CountryId
),最后您必须对这些表进行连接。因此,我的建议是通过CityId
与上述表结构连接,然后仅检索所需信息的其余部分。
回答by Steve S
I prefer to use an XREF table that contains a FK reference to the person/business table, a FK reference to the address table and, generally, a FK reference to a role table (HOME, OFFICE, etc) to delineate the actual type of address. I also include an ACTIVE flag to allow me to choose to ignore old address while preserving the ability to maintain an address history.
我更喜欢使用包含对个人/业务表的 FK 引用、对地址表的 FK 引用以及通常对角色表(HOME、OFFICE 等)的 FK 引用的 XREF 表来描述实际类型地址。我还包括一个 ACTIVE 标志,允许我选择忽略旧地址,同时保留维护地址历史的能力。
This approach allows me to maintain multiple addresses of varying types for each primary entity
这种方法允许我为每个主要实体维护多个不同类型的地址