database 应用程序开发人员犯的数据库开发错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/621884/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 07:13:04  来源:igfitidea点击:

Database development mistakes made by application developers

databasedatabase-design

提问by Charles Faiga

What are common database development mistakes made by application developers?

应用程序开发人员常犯的数据库开发错误有哪些?

回答by cletus

1. Not using appropriate indices

1. 没有使用合适的索引

This is a relatively easy one but still it happens all the time. Foreign keys should have indexes on them. If you're using a field in a WHEREyou should (probably) have an index on it. Such indexes should often cover multiple columns based on the queries you need to execute.

这是一个相对容易的,但它仍然发生在所有的时间。外键应该有索引。如果你在 a 中使用一个字段,WHERE你应该(可能)有一个索引。根据您需要执行的查询,此类索引通常应涵盖多个列。

2. Not enforcing referential integrity

2. 不强制执行参照完整性

Your database may vary here but if your database supports referential integrity--meaning that all foreign keys are guaranteed to point to an entity that exists--you should be using it.

您的数据库可能在这里有所不同,但如果您的数据库支持参照完整性——这意味着所有外键都保证指向一个存在的实体——您应该使用它。

It's quite common to see this failure on MySQL databases. I don't believe MyISAM supports it. InnoDB does. You'll find people who are using MyISAM or those that are using InnoDB but aren't using it anyway.

在 MySQL 数据库上看到这种失败是很常见的。我不相信 MyISAM 支持它。InnoDB 确实如此。你会发现有人在使用 MyISAM 或那些正在使用 InnoDB 但无论如何都没有使用它的人。

More here:

更多在这里:

3. Using natural rather than surrogate (technical) primary keys

3. 使用自然而不是代理(技术)主键

Natural keys are keys based on externally meaningful data that is (ostensibly) unique. Common examples are product codes, two-letter state codes (US), social security numbers and so on. Surrogate or technical primary keys are those that have absolutely no meaning outside the system. They are invented purely for identifying the entity and are typically auto-incrementing fields (SQL Server, MySQL, others) or sequences (most notably Oracle).

自然键是基于外部有意义的数据的键,这些数据(表面上)是唯一的。常见示例有产品代码、两个字母的州代码(美国)、社会安全号码等。代理或技术主键是那些在系统之外绝对没有意义的键。它们纯粹是为了识别实体而发明的,通常是自动递增的字段(SQL Server、MySQL 等)或序列(最显着的是 Oracle)。

In my opinion you should alwaysuse surrogate keys. This issue has come up in these questions:

在我看来,您应该始终使用代理键。这个问题出现在这些问题中:

This is a somewhat controversial topic on which you won't get universal agreement. While you may find some people, who think natural keys are in some situations OK, you won't find any criticism of surrogate keys other than being arguably unnecessary. That's quite a small downside if you ask me.

这是一个有点争议的话题,你不会得到普遍的同意。虽然您可能会发现有些人认为自然键在某些情况下是可以的,但除了可以说是不必要的之外,您不会发现对代理键的任何批评。如果你问我,这是一个很小的缺点。

Remember, even countries can cease to exist(for example, Yugoslavia).

请记住,即使是国家也可能不复存在(例如,南斯拉夫)。

4. Writing queries that require DISTINCTto work

4. 编写需要DISTINCT工作的查询

You often see this in ORM-generated queries. Look at the log output from Hibernate and you'll see all the queries begin with:

您经常在 ORM 生成的查询中看到这一点。查看 Hibernate 的日志输出,您将看到所有查询都以:

SELECT DISTINCT ...

This is a bit of a shortcut to ensuring you don't return duplicate rows and thus get duplicate objects. You'll sometimes see people doing this as well. If you see it too much it's a real red flag. Not that DISTINCTis bad or doesn't have valid applications. It does (on both counts) but it's not a surrogate or a stopgap for writing correct queries.

这是确保您不会返回重复行从而获得重复对象的一种快捷方式。你有时也会看到人们这样做。如果你看到太多,那就是一个真正的危险信号。不是DISTINCT不好或没有有效的应用程序。它确实(在这两个方面),但它不是编写正确查询的替代品或权宜之计。

From Why I Hate DISTINCT:

为什么我讨厌 DISTINCT

Where things start to go sour in my opinion is when a developer is building substantial query, joining tables together, and all of a sudden he realizes that it lookslike he is getting duplicate (or even more) rows and his immediate response...his "solution" to this "problem" is to throw on the DISTINCT keyword and POOFall his troubles go away.

在我看来,事情开始变得糟糕的地方是,当开发人员正在构建大量查询,将表连接在一起时,突然间他意识到他似乎收到了重复(甚至更多)的行和他的即时响应......他对这个“问题”的“解决方案”是抛出 DISTINCT 关键字和POOF他所有的麻烦都消失了。

5. Favouring aggregation over joins

5. 倾向于聚合而不是联接

Another common mistake by database application developers is to not realize how much more expensive aggregation (ie the GROUP BYclause) can be compared to joins.

数据库应用程序开发人员的另一个常见错误是没有意识到GROUP BY与连接相比,聚合(即子句)的开销要高得多。

To give you an idea of how widespread this is, I've written on this topic several times here and been downvoted a lot for it. For example:

为了让您了解这是多么普遍,我已经在这里写了好几次关于这个主题的文章,并因此受到了很多反对。例如:

From SQL statement - “join” vs “group by and having”:

SQL 语句 - “join” vs “group by and have”

First query:

SELECT userid
FROM userrole
WHERE roleid IN (1, 2, 3)
GROUP by userid
HAVING COUNT(1) = 3

Query time: 0.312 s

Second query:

SELECT t1.userid
FROM userrole t1
JOIN userrole t2 ON t1.userid = t2.userid AND t2.roleid = 2
JOIN userrole t3 ON t2.userid = t3.userid AND t3.roleid = 3
AND t1.roleid = 1

Query time: 0.016 s

That's right. The join version I proposed is twenty times faster than the aggregate version.

第一个查询:

SELECT userid
FROM userrole
WHERE roleid IN (1, 2, 3)
GROUP by userid
HAVING COUNT(1) = 3

查询时间:0.312 s

第二个查询:

SELECT t1.userid
FROM userrole t1
JOIN userrole t2 ON t1.userid = t2.userid AND t2.roleid = 2
JOIN userrole t3 ON t2.userid = t3.userid AND t3.roleid = 3
AND t1.roleid = 1

查询时间:0.016 s

这是正确的。我提出的加入版本比聚合版本快 20 倍。

6. Not simplifying complex queries through views

6. 不通过视图简化复杂的查询

Not all database vendors support views but for those that do, they can greatly simplify queries if used judiciously. For example, on one project I used a generic Party modelfor CRM. This is an extremely powerful and flexible modelling technique but can lead to many joins. In this model there were:

并非所有数据库供应商都支持视图,但对于那些支持视图的供应商,如果使用得当,它们可以大大简化查询。例如,在一个项目中,我为 CRM使用了一个通用的 Party 模型。这是一种极其强大且灵活的建模技术,但会导致许多连接。在这个模型中有:

  • Party: people and organisations;
  • Party Role: things those parties did, for example Employee and Employer;
  • Party Role Relationship: how those roles related to each other.
  • :人和组织;
  • 当事人角色:当事人所做的事情,例如雇员和雇主;
  • 参与方角色关系:这些角色如何相互关联。

Example:

例子:

  • Ted is a Person, being a subtype of Party;
  • Ted has many roles, one of which is Employee;
  • Intel is an organisation, being a subtype of a Party;
  • Intel has many roles, one of which is Employer;
  • Intel employs Ted, meaning there is a relationship between their respective roles.
  • Ted 是一个 Person,是 Party 的一个子类型;
  • Ted 有很多角色,其中之一是 Employee;
  • Intel是一个组织,是Party的一个子类型;
  • 英特尔有很多角色,其中之一是雇主;
  • 英特尔聘用了 Ted,这意味着他们各自的角色之间存在关系。

So there are five tables joined to link Ted to his employer. You assume all employees are Persons (not organisations) and provide this helper view:

所以有五个表连接起来,将 Ted 与他的雇主联系起来。你假设所有员工都是个人(不是组织)并提供这个助手视图:

CREATE VIEW vw_employee AS
SELECT p.title, p.given_names, p.surname, p.date_of_birth, p2.party_name employer_name
FROM person p
JOIN party py ON py.id = p.id
JOIN party_role child ON p.id = child.party_id
JOIN party_role_relationship prr ON child.id = prr.child_id AND prr.type = 'EMPLOYMENT'
JOIN party_role parent ON parent.id = prr.parent_id = parent.id
JOIN party p2 ON parent.party_id = p2.id

And suddenly you have a very simple view of the data you want but on a highly flexible data model.

突然之间,您对所需的数据有了一个非常简单的视图,但使用的是高度灵活的数据模型。

7. Not sanitizing input

7. 不清理输入

This is a huge one. Now I like PHP but if you don't know what you're doing it's really easy to create sites vulnerable to attack. Nothing sums it up better than the story of little Bobby Tables.

这是一个巨大的。现在我喜欢 PHP,但是如果您不知道自己在做什么,那么创建容易受到攻击的站点真的很容易。没有什么比小鲍比桌故事更能概括它的了。

Data provided by the user by way of URLs, form data and cookiesshould always be treated as hostile and sanitized. Make sure you're getting what you expect.

用户通过 URL、表单数据和 cookie提供的数据应始终被视为敌对和消毒。确保你得到你所期望的。

8. Not using prepared statements

8. 不使用准备好的语句

Prepared statements are when you compile a query minus the data used in inserts, updates and WHEREclauses and then supply that later. For example:

准备语句是当您编译查询时减去插入、更新和WHERE子句中使用的数据,然后再提供这些数据。例如:

SELECT * FROM users WHERE username = 'bob'

vs

对比

SELECT * FROM users WHERE username = ?

or

或者

SELECT * FROM users WHERE username = :username

depending on your platform.

取决于您的平台。

I've seen databases brought to their knees by doing this. Basically, each time any modern database encounters a new query it has to compile it. If it encounters a query it's seen before, you're giving the database the opportunity to cache the compiled query and the execution plan. By doing the query a lot you're giving the database the opportunity to figure that out and optimize accordingly (for example, by pinning the compiled query in memory).

我已经看到数据库因这样做而屈服。基本上,任何现代数据库每次遇到新查询时都必须对其进行编译。如果它遇到一个以前见过的查询,你就给了数据库缓存编译查询和执行计划的机会。通过执行大量查询,您可以让数据库有机会弄清楚并相应地进行优化(例如,通过将已编译的查询固定在内存中)。

Using prepared statements will also give you meaningful statistics about how often certain queries are used.

使用准备好的语句还可以为您提供有关使用某些查询的频率的有意义的统计信息。

Prepared statements will also better protect you against SQL injection attacks.

准备好的语句还可以更好地保护您免受 SQL 注入攻击。

9. Not normalizing enough

9. 标准化不够

Database normalizationis basically the process of optimizing database design or how you organize your data into tables.

数据库规范化基本上是优化数据库设计或如何将数据组织到表中的过程。

Just this week I ran across some code where someone had imploded an array and inserted it into a single field in a database. Normalizing that would be to treat element of that array as a separate row in a child table (ie a one-to-many relationship).

就在本周,我遇到了一些代码,其中有人内爆了一个数组并将其插入到数据库中的单个字段中。规范化是将该数组的元素视为子表中的单独行(即一对多关系)。

This also came up in Best method for storing a list of user IDs:

这也出现在存储用户 ID 列表的最佳方法中

I've seen in other systems that the list is stored in a serialized PHP array.

我在其他系统中看到列表存储在序列化的 PHP 数组中。

But lack of normalization comes in many forms.

但规范化的缺乏有多种形式。

More:

更多的:

10. Normalizing too much

10. 标准化太多

This may seem like a contradiction to the previous point but normalization, like many things, is a tool. It is a means to an end and not an end in and of itself. I think many developers forget this and start treating a "means" as an "end". Unit testing is a prime example of this.

这似乎与前一点相矛盾,但与许多事情一样,规范化是一种工具。它是达到目的的手段,而不是目的本身。我认为许多开发人员忘记了这一点,并开始将“手段”视为“目的”。单元测试就是一个很好的例子。

I once worked on a system that had a huge hierarchy for clients that went something like:

我曾经在一个系统上工作过,该系统为客户提供了一个巨大的层次结构,类似于:

Licensee ->  Dealer Group -> Company -> Practice -> ...

such that you had to join about 11 tables together before you could get any meaningful data. It was a good example of normalization taken too far.

这样您就必须将大约 11 个表连接在一起才能获得任何有意义的数据。这是规范化走得太远的一个很好的例子。

More to the point, careful and considered denormalization can have huge performance benefits but you have to be really careful when doing this.

更重要的是,仔细和深思熟虑的非规范化可以带来巨大的性能优势,但在执行此操作时必须非常小心。

More:

更多的:

11. Using exclusive arcs

11. 使用独占弧线

An exclusive arc is a common mistake where a table is created with two or more foreign keys where one and only one of them can be non-null. Big mistake.For one thing it becomes that much harder to maintain data integrity. After all, even with referential integrity, nothing is preventing two or more of these foreign keys from being set (complex check constraints notwithstanding).

排他弧是一种常见的错误,其中一个表是用两个或多个外键创建的,其中一个并且只有一个可以是非空的。 大错。一方面,维护数据完整性变得更加困难。毕竟,即使具有参照完整性,也没有什么可以阻止设置两个或多个这些外键(尽管存在复杂的检查约束)。

From A Practical Guide to Relational Database Design:

关系数据库设计实用指南

We have strongly advised against exclusive arc construction wherever possible, for the good reason that they can be awkward to write code and pose more maintenance difficulties.

我们强烈建议尽可能不要使用独占的弧结构,理由很充分,因为它们在编写代码时会很笨拙,并且会带来更多的维护困难。

12. Not doing performance analysis on queries at all

12. 根本不做查询性能分析

Pragmatism reigns supreme, particularly in the database world. If you're sticking to principles to the point that they've become a dogma then you've quite probably made mistakes. Take the example of the aggregate queries from above. The aggregate version might look "nice" but its performance is woeful. A performance comparison should've ended the debate (but it didn't) but more to the point: spouting such ill-informed views in the first place is ignorant, even dangerous.

实用主义至高无上,尤其是在数据库领域。如果你坚持原则到它们已经成为教条的地步,那么你很可能犯了错误。以上面的聚合查询为例。聚合版本可能看起来“不错”,但其性能却很糟糕。性能比较应该结束辩论(但它没有)但更重要的是:首先发表这种不明智的观点是无知的,甚至是危险的。

13. Over-reliance on UNION ALL and particularly UNION constructs

13. 过度依赖 UNION ALL 尤其是 UNION 结构

A UNION in SQL terms merely concatenates congruent data sets, meaning they have the same type and number of columns. The difference between them is that UNION ALL is a simple concatenation and should be preferred wherever possible whereas a UNION will implicitly do a DISTINCT to remove duplicate tuples.

SQL 术语中的 UNION 只是连接一致的数据集,这意味着它们具有相同的类型和列数。它们之间的区别在于 UNION ALL 是一个简单的连接,应该尽可能地首选,而 UNION 将隐式执行 DISTINCT 以删除重复的元组。

UNIONs, like DISTINCT, have their place. There are valid applications. But if you find yourself doing a lot of them, particularly in subqueries, then you're probably doing something wrong. That might be a case of poor query construction or a poorly designed data model forcing you to do such things.

UNION 和 DISTINCT 一样,都有自己的位置。有有效的应用程序。但是如果你发现自己做了很多事情,特别是在子查询中,那么你可能做错了。这可能是查询构造不当或数据模型设计不当迫使您执行此类操作的情况。

UNIONs, particularly when used in joins or dependent subqueries, can cripple a database. Try to avoid them whenever possible.

UNION,尤其是在连接或依赖子查询中使用时,可能会削弱数据库。尽量避免它们。

14. Using OR conditions in queries

14. 在查询中使用 OR 条件

This might seem harmless. After all, ANDs are OK. OR should be OK too right? Wrong. Basically an AND condition restrictsthe data set whereas an OR condition growsit but not in a way that lends itself to optimisation. Particularly when the different OR conditions might intersect thus forcing the optimizer to effectively to a DISTINCT operation on the result.

这可能看起来无害。毕竟,AND 是可以的。或者应该也可以吧?错误的。基本上,AND 条件会限制数据集,而 OR 条件会增加数据集,但不会以适合优化的方式增长。特别是当不同的 OR 条件可能相交时,从而迫使优化器有效地对结果进行 DISTINCT 操作。

Bad:

坏的:

... WHERE a = 2 OR a = 5 OR a = 11

Better:

更好的:

... WHERE a IN (2, 5, 11)

Now your SQL optimizer may effectively turn the first query into the second. But it might not. Just don't do it.

现在您的 SQL 优化器可以有效地将第一个查询转换为第二个查询。但它可能不会。只是不要这样做。

15. Not designing their data model to lend itself to high-performing solutions

15. 没有设计他们的数据模型以使其适用于高性能解决方案

This is a hard point to quantify. It is typically observed by its effect. If you find yourself writing gnarly queries for relatively simple tasks or that queries for finding out relatively straightforward information are not efficient, then you probably have a poor data model.

这是一个很难量化的点。通常通过其效果观察到。如果您发现自己为相对简单的任务编写了粗糙的查询,或者用于查找相对简单信息的查询效率不高,那么您的数据模型可能很差。

In some ways this point summarizes all the earlier ones but it's more of a cautionary tale that doing things like query optimisation is often done first when it should be done second. First and foremost you should ensure you have a good data model before trying to optimize the performance. As Knuth said:

在某些方面,这一点总结了所有较早的内容,但它更像是一个警示故事,即执行诸如查询优化之类的事情通常应该先做,而应该在其次做。首先,在尝试优化性能之前,您应该确保拥有良好的数据模型。正如克努斯所说:

Premature optimization is the root of all evil

过早优化是万恶之源

16. Incorrect use of Database Transactions

16. 数据库事务的错误使用

All data changes for a specific process should be atomic. I.e. If the operation succeeds, it does so fully. If it fails, the data is left unchanged. - There should be no possibility of 'half-done' changes.

特定进程的所有数据更改都应该是原子的。即,如果操作成功,则完全执行。如果失败,则数据保持不变。- 不应该有“半完成”的变化的可能性。

Ideally, the simplest way to achieve this is that the entire system design should strive to support all data changes through single INSERT/UPDATE/DELETE statements. In this case, no special transaction handling is needed, as your database engine should do so automatically.

理想情况下,实现这一点的最简单方法是整个系统设计应努力通过单个 INSERT/UPDATE/DELETE 语句支持所有数据更改。在这种情况下,不需要特殊的事务处理,因为您的数据库引擎应该自动执行此操作。

However, if any processes do require multiple statements be performed as a unit to keep the data in a consistent state, then appropriate Transaction Control is necessary.

但是,如果任何进程确实需要将多个语句作为一个单元执行以保持数据处于一致状态,则需要适当的事务控制。

  • Begin a Transaction before the first statement.
  • Commit the Transaction after the last statement.
  • On any error, Rollback the Transaction. And very NB! Don't forget to skip/abort all statements that follow after the error.
  • 在第一条语句之前开始事务。
  • 在最后一条语句之后提交事务。
  • 出现任何错误,回滚事务。而且很NB!不要忘记跳过/中止错误之后的所有语句。

Also recommended to pay careful attention to the subtelties of how your database connectivity layer, and database engine interact in this regard.

还建议仔细注意您的数据库连接层如何与数据库引擎在这方面交互的微妙之处。

17. Not understanding the 'set-based' paradigm

17. 不理解“基于集合”的范式

The SQL language follows a specific paradigm suited to specific kinds of problems. Various vendor-specific extensions notwithstanding, the language struggles to deal with problems that are trivial in langues like Java, C#, Delphi etc.

SQL 语言遵循适用于特定类型问题的特定范式。尽管有各种特定于供应商的扩展,但该语言仍难以处理 Java、C#、Delphi 等语言中微不足道的问题。

This lack of understanding manifests itself in a few ways.

这种缺乏理解表现在几个方面。

  • Inappropriately imposing too much procedural or imperative logic on the databse.
  • Inappropriate or excessive use of cursors. Especially when a single query would suffice.
  • Incorrectly assuming that triggers fire once per row affected in multi-row updates.
  • 不恰当地在数据库上强加过多的程序或命令式逻辑。
  • 不适当或过度使用游标。特别是当一个查询就足够了。
  • 错误地假设触发器在多行更新中每行触发一次。

Determine clear division of responsibility, and strive to use the appropriate tool to solve each problem.

确定明确的职责分工,力求用合适的工具解决每一个问题。

回答by ConcernedOfTunbridgeWells

Key database design and programming mistakes made by developers

开发人员所犯的关键数据库设计和编程错误

  • Selfish database design and usage.Developers often treat the database as their personal persistent object store without considering the needs of other stakeholders in the data. This also applies to application architects. Poor database design and data integrity makes it hard for third parties working with the data and can substantially increase the system's life cycle costs. Reporting and MIS tends to be a poor cousin in application design and only done as an afterthought.

  • Abusing denormalised data.Overdoing denormalised data and trying to maintain it within the application is a recipe for data integrity issues. Use denormalisation sparingly. Not wanting to add a join to a query is not an excuse for denormalising.

  • Scared of writing SQL.SQL isn't rocket science and is actually quite good at doing its job. O/R mapping layers are quite good at doing the 95% of queries that are simple and fit well into that model. Sometimes SQL is the best way to do the job.

  • Dogmatic 'No Stored Procedures' policies.Regardless of whether you believe stored procedures are evil, this sort of dogmatic attitude has no place on a software project.

  • Not understanding database design.Normalisation is your friend and it's not rocket science.Joining and cardinality are fairly simple concepts - if you're involved in database application development there's really no excuse for not understanding them.

  • 自私的数据库设计和使用。开发人员通常将数据库视为他们个人的持久对象存储,而不考虑数据中其他利益相关者的需求。这也适用于应用程序架构师。糟糕的数据库设计和数据完整性使得第三方难以处理数据,并且会显着增加系统的生命周期成本。报告和 MIS 在应用程序设计中往往是一个糟糕的表亲,只能作为事后的想法。

  • 滥用非规范化数据。过度使用非规范化数据并试图在应用程序中维护它会导致数据完整性问题。谨慎使用非规范化。不想向查询添加连接并不是非规范化的借口。

  • 害怕写SQL。SQL 不是火箭科学,实际上很擅长做它的工作。O/R 映射层非常擅长执行 95% 的简单查询,并且非常适合该模型。有时,SQL 是完成这项工作的最佳方式。

  • 教条的“无存储过程”政策。不管您是否相信存储过程是邪恶的,这种教条主义的态度在软件项目中都没有立足之地。

  • 不了解数据库设计。规范化是你的朋友,它不是火箭科学。联接和基数是相当简单的概念 - 如果您参与数据库应用程序开发,那么真的没有理由不理解它们。

回答by Rad

  1. Not using version control on the database schema
  2. Working directly against a live database
  3. Not reading up and understanding more advanced database concepts (indexes, clustered indexes, constraints, materialized views, etc)
  4. Failing to test for scalability ... test data of only 3 or 4 rows will never give you the real picture of real live performance
  1. 不在数据库架构上使用版本控制
  2. 直接针对实时数据库工作
  3. 没有阅读和理解更高级的数据库概念(索引、聚集索引、约束、物化视图等)
  4. 无法测试可扩展性……只有 3 或 4 行的测试数据永远不会为您提供真实现场表演的真实画面

回答by Ash

Over-use and/or dependence on stored procedures.

过度使用和/或依赖存储过程。

Some application developers see stored procedures as a direct extension of middle tier/front end code. This appears to be a common trait in Microsoft stack developers, (I'm one, but I've grown out of it) and produces many stored procedures that perform complex business logic and workflow processing. This is much better done elsewhere.

一些应用程序开发人员将存储过程视为中间层/前端代码的直接扩展。这似乎是 Microsoft 堆栈开发人员的一个共同特征(我是其中之一,但我已经摆脱了它)并生成了许多执行复杂业务逻辑和工作流处理的存储过程。这在其他地方做得更好。

Stored procedures are useful where it has actuallly been proven that some real technical factor necessitates their use (for example, performance and security) For example, keeping aggregation/filtering of large data sets "close to the data".

存储过程在实际证明某些真正的技术因素需要它们的使用(例如,性能和安全性)的情况下很有用。例如,保持大型数据集的聚合/过滤“接近数据”。

I recently had to help maintain and enhance a large Delphi desktop application of which 70% of the business logic and rules were implemented in 1400 SQL Server stored procedures (the remainder in UI event handlers). This was a nightmare, primarily due to the difficuly of introducing effective unit testing to TSQL, lack of encapsulation and poor tools (Debuggers, editors).

我最近不得不帮助维护和增强一个大型 Delphi 桌面应用程序,其中 70% 的业务逻辑和规则是在 1400 个 SQL Server 存储过程中实现的(其余在 UI 事件处理程序中)。这是一场噩梦,主要是由于难以向 TSQL 引入有效的单元测试、缺乏封装和糟糕的工具(调试器、编辑器)。

Working with a Java team in the past I quickly found out that often the complete opposite holds in that environment. A Java Architect once told me: "The database is for data, not code.".

过去与 Java 团队一起工作时,我很快发现在那个环境中往往完全相反。一位 Java 架构师曾告诉我:“数据库是用于数据的,而不是代码。”。

These days I think it's a mistake to not consider stored procs at all, but they should be used sparingly (not by default) in situations where they provide useful benefits (see the other answers).

这些天我认为根本不考虑存储过程是错误的,但是在它们提供有用的好处的情况下应该谨慎使用它们(不是默认情况下)(请参阅其他答案)。

回答by Bob Moore

Number one problem? They only test on toy databases. So they have no idea that their SQL will crawl when the database gets big, and someone has to come along and fix it later (that sound you can hear is my teeth grinding).

第一个问题?他们只测试玩具数据库。所以他们不知道当数据库变大时他们的 SQL 会爬行,并且必须有人过来修复它(你能听到的声音是我磨牙的声音)。

回答by Christophe Herreman

Not using indexes.

不使用索引。

回答by adam

Poor Performance Caused by Correlated Subqueries

相关子查询导致性能不佳

Most of the time you want to avoid correlated subqueries. A subquery is correlated if, within the subquery, there is a reference to a column from the outer query. When this happens, the subquery is executed at least once for every row returned and could be executed more times if other conditions are applied after the condition containing the correlated subquery is applied.

大多数情况下,您希望避免相关子查询。如果在子查询中存在对来自外部查询的列的引用,则子查询是相关的。发生这种情况时,子查询至少对返回的每一行执行一次,如果在应用包含相关子查询的条件后应用其他条件,则可以执行更多次。

Forgive the contrived example and the Oracle syntax, but let's say you wanted to find all the employees that have been hired in any of your stores since the last time the store did less than $10,000 of sales in a day.

请原谅人为的示例和 Oracle 语法,但假设您想查找自上次商店一天内销售额低于 10,000 美元以来在您的任何商店中雇用的所有员工。

select e.first_name, e.last_name
from employee e
where e.start_date > 
        (select max(ds.transaction_date)
         from daily_sales ds
         where ds.store_id = e.store_id and
               ds.total < 10000)

The subquery in this example is correlated to the outer query by the store_id and would be executed for every employee in your system. One way that this query could be optimized is to move the subquery to an inline-view.

此示例中的子查询通过 store_id 与外部查询相关联,并将为系统中的每个员工执行。可以优化此查询的一种方法是将子查询移动到内联视图。

select e.first_name, e.last_name
from employee e,
     (select ds.store_id,
             max(s.transaction_date) transaction_date
      from daily_sales ds
      where ds.total < 10000
      group by s.store_id) dsx
where e.store_id = dsx.store_id and
      e.start_date > dsx.transaction_date

In this example, the query in the from clause is now an inline-view (again some Oracle specific syntax) and is only executed once. Depending on your data model, this query will probably execute much faster. It would perform better than the first query as the number of employees grew. The first query could actually perform better if there were few employees and many stores (and perhaps many of stores had no employees) and the daily_sales table was indexed on store_id. This is not a likely scenario but shows how a correlated query could possibly perform better than an alternative.

在这个例子中,from 子句中的查询现在是一个内联视图(也是一些特定于 Oracle 的语法)并且只执行一次。根据您的数据模型,此查询的执行速度可能会快得多。随着员工人数的增加,它的性能会比第一个查询更好。如果员工很少而商店很多(可能很多商店都没有员工)并且daily_sales 表在store_id 上建立索引,那么第一个查询实际上可以执行得更好。这不是一个可能的场景,但显示了相关查询如何可能比替代查询执行得更好。

I've seen junior developers correlate subqueries many times and it usually has had a severe impact on performance. However, when removing a correlated subquery be sure to look at the explain planbefore and after to make sure you are not making the performance worse.

我见过初级开发人员多次关联子查询,这通常会对性能产生严重影响。但是,在删除相关子查询时,请务必查看前后的解释计划,以确保不会使性能变差。

回答by Kb.

In my experience:
Not communicating with experienced DBAs.

以我的经验:
不与有经验的 DBA 交流。

回答by Nathan Voxland

Using Access instead of a "real" database. There are plenty of great small and even free databases like SQL Express, MySQL, and SQLitethat will work and scale much better. Apps often need to scale in unexpected ways.

使用 Access 而不是“真正的”数据库。有很多很棒的小型甚至免费的数据库,例如SQL ExpressMySQLSQLite,它们的工作和扩展性会更好。应用程序通常需要以意想不到的方式扩展。

回答by TheTXI

Forgetting to set up relationships between the tables. I remember having to clean this up when I first started working at my current employer.

忘记设置表之间的关系。我记得当我第一次开始在我现在的雇主工作时必须清理它。