MySQL 查找表有多重要?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4824024/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 18:33:49  来源:igfitidea点击:

How important are lookup tables?

sqlmysqldatabase-design

提问by BDuelz

A lot of the applications I write make use of lookup tables, since that was just the way I was taught (normalization and such). The problem is that the queries I make are often more complicated because of this. They often look like this

我编写的许多应用程序都使用了查找表,因为这正是我学习的方式(规范化等)。问题是我所做的查询通常因此更加复杂。他们经常是这样的

get all posts that are still open

获取所有仍然打开的帖子

"SELECT * FROM posts WHERE status_id = (SELECT id FROM statuses WHERE name = 'open')"

Often times, the lookup tables themselves are very short. For instance, there may only be 3 or so different statuses. In this case, would it be okay to search for a certain type by using a constant or so in the application? Something like

很多时候,查找表本身很短。例如,可能只有 3 种左右的不同状态。在这种情况下,是否可以在应用程序中使用常量左右来搜索某种类型?就像是

get all posts that are still open

获取所有仍然打开的帖子

"SELECT * FROM posts WHERE status_id = ".Status::OPEN

Or, what if instead of using a foreign id, I set it as an enum and queried off of that?

或者,如果我不使用外部 ID,而是将其设置为枚举并对其进行查询,该怎么办?

Thanks.

谢谢。

回答by PerformanceDBA

The answer depends a little if you are limited to small filing systems in MyNonSQL, or if you are thinking about SQL and large databases.

答案取决于您是否仅限于 MyNonSQL 中的小型文件系统,或者您是否正在考虑 SQL 和大型数据库。

In real Databases, where there are many apps using one database, and many users using different report tools (not just the apps) to access the data, standards, normalisation, and open architecture requirements are important.

在真实的数据库中,有许多应用程序使用一个数据库,许多用户使用不同的报告工具(不仅仅是应用程序)来访问数据,标准、规范化和开放架构要求很重要。

Despite the people who attempt to change the definition of "normalisation", etc. to suit the purpose, Normalisation has not changed.

尽管人们试图改变“规范化”等的定义以适应目的,但规范化并没有改变。

  • if you have "Open" and "Closed" repeated in data tables, that is a simple Normalisation error. If you change those values you may have to update millions of rows, which is very limited design. Such values are commonly normalised into a Reference or Lookup table. It also saves space. The value "Open", "Closed" etc is no longer duplicated.

  • the second point is ease of change, if "Closed" were changed to "Expired", again, one row needs to be changed, and that is reflected in the entire database; whereas in the unnormalised files, millions of rows need to be changed.

  • Adding new values is simply a matter of inserting one row.

  • in Open Architecture terms, the Lookup table is an ordinary table. It exists in the (standard SQL) catalogue; any report tool can find it, as long as the PK::FK relation is defined, the report tool can find that as well.

  • Enum is only for the Non-SQLS. In SQL the Enum is a Lookup table.

  • The next point relates to the meaningfulness of the key. If the Key is meaningless to the user, fine, use an INT or TINYINT or whatever is suitable; number them incrementally; allow "gaps". But if the Key is meaningful to the user, do not use a meaningless number, do use the meaningful key. "M" and "F" for Male and Female, etc.

    • Now some people will get in to tangents re the permanence of PKs. That is a separate point. Yes, of course, always use a stable value for a PK. "M" and "F" are unlikely to change; if you have used {0,1,2,4,5,6}, well don't change it, why would you want to. Those values were supposed to be meaningless, only meaningful Key need to be changed.
      .
  • if you do use meaningful keys, use short alphabetic codes, that both users and developers can readily understand (and infer to long description from).

  • Since PKs are stable, particularly in Lookup tables, you can safely code:

    WHERE status_id = 'O'

    You do not have to join with the Lookup table and examine the Value "Open". That loses the value of the Lookup table in the code segments.

  • 如果数据表中重复出现“打开”和“关闭”,那是一个简单的归一化错误。如果您更改这些值,您可能需要更新数百万行,这是非常有限的设计。这些值通常被标准化为参考或查找表。它还可以节省空间。值“打开”、“关闭”等不再重复。

  • 第二点是易于更改,如果将“Closed”更改为“Expired”,则再次需要更改一行,并反映在整个数据库中;而在非规范化文件中,需要更改数百万行。

  • 添加新值只是插入一行的问题。

  • 在开放架构术语中,查找表是一个普通表。它存在于(标准 SQL)目录中;任何报表工具都可以找到它,只要定义了 PK::FK 关系,报表工具也可以找到它。

  • 枚举仅适用于非 SQLS。在 SQL 中,枚举是一个查找表。

  • 下一点与密钥的意义有关。如果 Key 对用户没有意义,那么可以使用 INT 或 TINYINT 或任何合适的;递增地编号;允许“间隙”。但如果 Key 对用户有意义,就不要使用无意义的数字,而要使用有意义的键。“M”和“F”代表男性和女性等。

    • 现在有些人会对 PK 的持久性产生兴趣。这是一个单独的观点。是的,当然,PK 始终使用稳定值。“M”和“F”不太可能改变;如果您使用过 {0,1,2,4,5,6},那么不要更改它,您为什么要更改它。那些值应该是没有意义的,只有有意义的 Key 需要改变。
      .
  • 如果您确实使用了有意义的键,请使用用户和开发人员都可以轻松理解的短字母代码(并从中推断出长描述)。

  • 由于 PK 是稳定的,尤其是在查找表中,您可以安全地编码:

    WHERE status_id = 'O'

    您不必加入查找表并检查值“打开”。这会丢失代码段中查找表的值。

SQL is a cumbersome language, especially when it comes to joins. But that is all we have, so we need to just accept the encumbrance and deal with it. Your example code is fine. But simpler forms can do the same thing. A report tool would generate:

SQL 是一种繁琐的语言,尤其是在连接方面。但这就是我们所拥有的,所以我们只需要接受负担并处理它。您的示例代码很好。但是更简单的形式可以做同样的事情。报告工具将生成:

SELECT  p.*,
         s.name
    FROM posts p, 
         status s
    WHERE p.status_id = s.status_id 
    AND   p.status_id = 'O'

  • For banking systems, where we use short codes which are meaningful (since they are meaningful, we do not change them with the seasons, we just add to them), given a Lookup table such as (carefully chosen, similar to ISO Country Codes):

    Eq   Equity
    EqCS Equity/Common Share
    O    Over The Counter
    OF   OTC/Future

    Code such as this is common:

    WHERE InstrumentTypeCode LIKE "Eq%"

  • 对于银行系统,我们使用有意义的短代码(因为它们有意义,我们不会随着季节改变它们,我们只是添加它们),给定一个查找表,例如(精心选择,类似于 ISO 国家代码) :

    Eq   Equity
    EqCS Equity/Common Share
    O    Over The Counter
    OF   OTC/Future

    像这样的代码很常见:

    WHERE InstrumentTypeCode LIKE "Eq%"

And the users would choose the value from a drop-down that displayed "Open", "Closed", etc., not {0,1,2,4,5,6}, not {M, F, U}. Both in the apps, and in the report tool. Without a lookup table, you can't do that.

用户将从显示“打开”、“关闭”等的下拉列表中选择值,而不是 {0,1,2,4,5,6},而不是 {M, F, U}。在应用程序和报告工具中。如果没有查找表,您就无法做到这一点。

Last, If the database was large, and supported BI or DSS or OLAP functions (the highly Normalised databases do), then the Lookup table is actually a Dimension or Vector, in Dimension-Fact analyses. If it was not there, then it would have to be added in, to satisfy the requirements of that software, before such analyses can be mounted.

最后,如果数据库很大,并且支持 BI 或 DSS 或 OLAP 功能(高度规范化的数据库支持),那么在维度-事实分析中,查找表实际上是一个维度或向量。如果它不存在,则必须添加它以满足该软件的要求,然后才能安装此类分析。

回答by BDuelz

For look-up tables I use a sensible primary key -- usually just a CHAR(1) that makes sense in the domain with an additional Title (VARCHAR) field. This can maintain relationship enforcement while "keeping the SQL simple". The key to remember here is the look-up table does not "contain data". It contains identities. Some other identities might be time-zone names or assigned IOC country codes.

对于查找表,我使用一个合理的主键——通常只是一个 CHAR(1),它在域中有意义,并带有一个附加的 Title (VARCHAR) 字段。这可以在“保持 SQL 简单”的同时保持关系强制执行。这里要记住的关键是查找表不“包含数据”。它包含身份。其他一些身份可能是时区名称或分配的IOC 国家/地区代码

For instance gender:

例如性别:

ID Label
M  Male
F  Female
N  Neutral
select * from people where gender = 'M'

Alternatively, an ORM could be used and manual SQL generation might never have to be done -- in this case the standard "int" surrogate key approach is fine because something else deals with it :-)

或者,可以使用 ORM 并且可能永远不必手动生成 SQL——在这种情况下,标准的“int”代理键方法很好,因为其他东西可以处理它:-)

Happy coding.

快乐编码。

回答by Valamas

Create a function for each lookup. There is no easy way. You want performance and query simplicity. Ensure the following is maintained. You could create a SP_TestAppEnums to compare existing lookup values against the function and look for out of sync/zero returned.

为每次查找创建一个函数。没有简单的方法。您想要性能和查询简单性。确保维护以下内容。您可以创建一个 SP_TestAppEnums 来将现有的查找值与函数进行比较,并查找返回的不同步/零。

CREATE FUNCTION [Enum_Post](@postname varchar(10))
RETURNS int
AS
BEGIN
DECLARE @postId int
SET @postId =
CASE @postname
WHEN 'Open' THEN 1
WHEN 'Closed' THEN 2
END

RETURN @postId
END
GO

/* Calling the function */
SELECT dbo.Enum_Post('Open')
SELECT dbo.Enum_Post('Closed')

回答by Nicholas Carey

Question is: do you need to include the lookup tables (domain tables 'round my neck of the woods) in your queries? Presumably, these sorts of tables are usually

问题是:您是否需要在查询中包含查找表(域表'绕着我的脖子')?据推测,这些类型的表通常是

  • pretty static in nature — the domain might get extended, but it probably won't get shortened.
  • their primary key values are pretty unlikely to change as well (e.g., the status_id for a status of 'open' is unlikely to suddenly get changed to something other than what it was created as).
  • 本质上非常静态——域可能会被扩展,但它可能不会被缩短。
  • 它们的主键值也不太可能改变(例如,'open' 状态的 status_id 不太可能突然更改为与创建时不同的值)。

If the above assumptions are correct, there's no real need to add all those extra tables to your joins just so your where clause can use a friend name instead of an id value. Just filter on status_id directly where you need to. I'd suspect the non-key attribute in the where clause ('name' in your example above) is more likely to get changes than the key attribute ('name' in your example above): you're more protected by referencing the desire key value(s) of the domain table in your join.

如果上述假设是正确的,那么实际上没有必要将所有这些额外的表添加到您的连接中,这样您的 where 子句就可以使用朋友名称而不是 id 值。只需在需要的地方直接过滤 status_id 即可。我怀疑 where 子句中的非关键属性(上面示例中的“name”)比关键属性(上面示例中的“name”)更有可能获得更改:通过引用想要加入的域表的键值。

Domain tables serve

域表服务

  • to limit the domain of the variable via a foreign key relationship,
  • to allow the domain to be expanded by adding data to the domain table,
  • to populate UI controls and the like with user-friendly information,
  • 通过外键关系限制变量的域,
  • 允许通过向域表中添加数据来扩展域,
  • 用用户友好的信息填充 UI 控件等,

Naturally, you'd need to suck domain tables into your queries where you you actually required the non-key attributes from the domain table (e.g., descriptive name of the value).

自然地,您需要将域表吸收到您的查询中,您实际上需要域表中的非键属性(例如,值的描述性名称)。

YMMV: a lot depends on context and the nature of the problem space.

YMMV:很大程度上取决于上下文和问题空间的性质。

回答by Matchu

The commenters have convinced me of the error of my ways. This answer and the discussion that went along with it, however, remain here for reference.

评论者让我相信我的方式是错误的。然而,这个答案和随之而来的讨论仍然留在这里以供参考。

I think a constant is appropriate here, and a database table is not. As you design your application, you expect that table of statuses to never, ever change, since your application has hard-coded into it what those statuses mean, anyway. The point of a database is that the data within it willchange. There are cases where the lines are fuzzy (e.g. "this data might change every few months or so…"), but this is not one of the fuzzy cases.

我认为常量在这里是合适的,而数据库表则不是。在您设计应用程序时,您希望状态表永远不会改变,因为无论如何,您的应用程序已经将这些状态的含义硬编码到其中。数据库的重点是其中的数据发生变化。有些情况下线条是模糊的(例如“这个数据可能每隔几个月左右就会改变......”),但这不是模糊的情况之一。

Statuses are a part of your application's logic; use constants to define them within the application. It's not only more strictly organized that way, but it will also allow your database interactions to be significantly speedier.

状态是应用程序逻辑的一部分;使用常量在应用程序中定义它们。它不仅以这种方式组织得更加严格,而且还可以让您的数据库交互显着加快。

回答by DeveloperChris

The answer is "whatever makes sense".

答案是“任何有意义的事情”。

lookup tables involve joins or subqueries which are not always efficient. I make use of enums a lot to do this job. its efficient and fast

查找表涉及并不总是有效的连接或子查询。我经常使用枚举来完成这项工作。它的高效和快速

回答by XIVSolutions

Where possible (and It is not always . . .), I use this rule of thumb: If I need to hard-code a value into my application (vs. let it remain a record in the database), and also store that vlue in my database, then something is amiss with my design. It's not ALWAYS true, but basically, whatever the value in question is, it either represents a piece of DATA, or a peice of PROGRAM LOGIC. It is a rare case that it is both.

在可能的情况下(并不总是......),我使用这个经验法则:如果我需要将一个值硬编码到我的应用程序中(而不是让它保留在数据库中的记录),并且还存储该值在我的数据库中,那么我的设计有问题。这并不总是正确的,但基本上,无论所讨论的值是什么,它要么代表一段数据,要么代表一段程序逻辑。两者兼而有之,实属罕见。

NOT that you won't find yourself discovering which one it is halfway into the project. But as the others said above, there can be trade-offs either way. Just as we don't always acheive "perfect" normalization in a database design (for reason of performance, or simply because you CAN take thngs too far in pursuit of acedemic perfection . . .), we may make some concious choices about where we locate our "look-up" values.

并不是说您不会发现自己在项目进行到一半时发现它是哪一个。但正如上面其他人所说,两种方式都可以取舍。正如我们并不总是在数据库设计中实现“完美”的规范化(出于性能原因,或者仅仅是因为您可以在追求学术完美的过程中走得太远……),我们可能会对我们在哪里做出一些有意识的选择找到我们的“查找”值。

Personally, though, I try to stand on my rule above. It is either DATA, or PROGRAM LOGIC, and rarely both. If it ends up as (or IN) a record in the databse, I try to keep it out of the Application code (except, of course, to retrieve it from the database . . .). If it is hardcoded in my application, I try to keep it out of my database.

不过,就我个人而言,我试图坚持我上面的规则。它要么是数据,要么是程序逻辑,很少两者兼而有之。如果它最终成为(或 IN)数据库中的一条记录,我会尝试将其保留在应用程序代码之外(当然,除了从数据库中检索它......)。如果它在我的应用程序中被硬编码,我会尽量将它保留在我的数据库之外。

In cases where I can't observe this rule, I DOCUMENT THE CODE with my reasoning, so three years later, some poor soul will be able to ficure out how it broke, if that happens.

如果我不能遵守这条规则,我会用我的推理记录代码,所以三年后,如果发生这种情况,一些可怜的人将能够弄清楚它是如何崩溃的。