database 每个开发人员都应该了解数据库吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1981526/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What should every developer know about databases?
提问by Aaronaught
Whether we like it or not, many if not most of us developers either regularly work with databases or may have to work with one someday. And considering the amount of misuse and abuse in the wild, and the volume of database-related questions that come up every day, it's fair to say that there are certain concepts that developers should know - even if they don't design or work with databases today. So:
无论我们喜欢与否,我们中的大多数开发人员要么经常使用数据库,要么有一天可能不得不使用数据库。考虑到滥用和滥用的数量,以及每天出现的与数据库相关的问题的数量,可以说开发人员应该知道某些概念——即使他们不设计或使用今天的数据库。所以:
What are the important concepts that developers and other software professionals ought to know about databases?
开发人员和其他软件专业人员应该了解哪些有关数据库的重要概念?
Guidelines for Responses:
回应指南:
Keep your list short.
One concept per answer is best.
保持你的清单简短。
每个答案一个概念最好。
Be specific.
"Data modelling" may be an important skill, but what does that mean precisely?
要具体。
“数据建模”可能是一项重要的技能,但这究竟意味着什么?
Explain your rationale.
Why is your concept important? Don't just say "use indexes." Don't fall into "best practices." Convince your audience to go learn more.
解释你的理由。
为什么你的概念很重要?不要只说“使用索引”。不要陷入“最佳实践”。说服你的听众去了解更多。
Upvote answers you agree with.
Read other people's answers first. One high-ranked answer is a more effective statement than two low-ranked ones. If you have more to add, either add a comment or reference the original.
赞成您同意的答案。
首先阅读其他人的答案。一个排名高的答案比两个排名低的答案更有效。如果您有更多要添加的内容,请添加评论或参考原文。
Don't downvote something just because it doesn't apply to you personally.
We all work in different domains. The objective here is to provide direction for database novices to gain a well-founded, well-rounded understanding of database design and database-driven development, not to compete for the title of most-important.
不要仅仅因为它不适用于您个人而拒绝投票。
我们都在不同的领域工作。这里的目标是为数据库新手提供方向,让他们对数据库设计和数据库驱动开发有一个有根据的、全面的理解,而不是为了争夺最重要的头衔。
回答by Walter Mitty
The very first thing developers should know about databases is this: what are databases for? Not how do they work, nor how do you build one, nor even how do you write code to retrieve or update the data in a database. But what are they for?
开发人员应该了解数据库的第一件事是:数据库的用途是什么?不是它们如何工作,也不是您如何构建它们,甚至不是您如何编写代码来检索或更新数据库中的数据。但它们是为了什么?
Unfortunately, the answer to this one is a moving target. In the heydey of databases, the 1970s through the early 1990s, databases were for the sharing of data.If you were using a database, and you weren't sharing data you were either involved in an academic project or you were wasting resources, including yourself. Setting up a database and taming a DBMS were such monumental tasks that the payback, in terms of data exploited multiple times, had to be huge to match the investment.
不幸的是,这个问题的答案是一个移动的目标。 在数据库的鼎盛时期,从 1970 年代到 1990 年代初期,数据库用于共享数据。如果您使用的是数据库,并且没有共享数据,那么您要么参与了学术项目,要么在浪费资源,包括您自己。建立数据库和驯服 DBMS 是一项艰巨的任务,就多次利用的数据而言,回报必须与投资相匹配。
Over the last 15 years, databases have come to be used for storing the persistent data associated with just one application.Building a database for MySQL, or Access, or SQL Serverhas become so routine that databases have become almost a routine part of an ordinary application. Sometimes, that initial limited mission gets pushed upward by mission creep, as the real value of the data becomes apparent. Unfortunately, databases that were designed with a single purpose in mind often fail dramatically when they begin to be pushed into a role that's enterprise wide and mission critical.
在过去的 15 年中,数据库已经开始用于存储与一个应用程序相关的持久数据。为MySQL、Access或SQL Server构建数据库已经变得如此常规,以至于数据库几乎成为普通应用程序的常规部分。有时,随着数据的真正价值变得显而易见,最初的有限任务会被任务蠕变推高。不幸的是,当它们开始被推到企业范围和任务关键的角色时,以单一目的设计的数据库通常会失败。
The second thing developers need to learn about databases is the whole data centric viewof the world. The data centric world view is more different from the process centric world view than anything most developers have ever learned. Compared to this gap, the gap between structured programming and object oriented programming is relatively small.
开发人员需要了解的关于数据库的第二件事是整个以数据为中心的世界观。以数据为中心的世界观与以流程为中心的世界观相比,大多数开发人员所学到的东西都大不相同。相比这个差距,结构化编程和面向对象编程的差距就比较小了。
The third thing developers need to learn, at least in an overview, is data modeling, including conceptual data modeling, logical data modeling, and physical data modeling.
开发人员需要学习的第三件事,至少在概述中,是数据建模,包括概念数据建模、逻辑数据建模和物理数据建模。
Conceptual data modelingis really requirements analysis from a data centric point of view.
概念数据建模实际上是从以数据为中心的角度进行的需求分析。
Logical data modelingis generally the application of a specific data model to the requirements discovered in conceptual data modeling. The relational model is used far more than any other specific model, and developers need to learn the relational model for sure. Designing a powerful and relevant relational model for a nontrivial requirement is not a trivial task. You can't build good SQL tables if you misunderstand the relational model.
逻辑数据建模通常是将特定数据模型应用于概念数据建模中发现的需求。关系模型的使用远远超过任何其他特定模型,开发人员肯定需要学习关系模型。为一个重要的需求设计一个强大且相关的关系模型并不是一项简单的任务。如果您误解了关系模型,就无法构建好的 SQL 表。
Physical data modelingis generally DBMS specific, and doesn't need to be learned in much detail, unless the developer is also the database builder or the DBA. What developers do need to understand is the extent to which physical database design can be separated from logical database design, and the extent to which producing a high speed database can be accomplished just by tweaking the physical design.
物理数据建模通常是特定于 DBMS 的,不需要详细学习,除非开发人员也是数据库构建者或 DBA。开发人员需要了解的是物理数据库设计与逻辑数据库设计的分离程度,以及仅通过调整物理设计即可完成生成高速数据库的程度。
The next thing developers need to learn is that while speed (performance) is important, other measures of design goodness are even more important, such as the ability to revise and extend the scope of the database down the road, or simplicity of programming.
开发人员接下来需要了解的是,虽然速度(性能)很重要,但其他衡量设计优劣的指标更为重要,例如修改和扩展数据库范围的能力,或编程的简单性。
Finally, anybody who messes with databases needs to understand that the value of data often outlasts the system that captured it.
最后,任何与数据库打交道的人都需要了解数据的价值通常比捕获它的系统更持久。
Whew!
哇!
回答by Randy Minder
Good question. The following are some thoughts in no particular order:
好问题。以下是一些想法,排名不分先后:
Normalization, to at least the second normal form, is essential.
Referential integrity is also essential, with proper cascading delete and update considerations.
Good and proper use of check constraints. Let the database do as much work as possible.
Don't scatter business logic in both the database and middle tier code. Pick one or the other, preferably in middle tier code.
Decide on a consistent approach for primary keys and clustered keys.
Don't over index. Choose your indexes wisely.
Consistent table and column naming. Pick a standard and stick to it.
Limit the number of columns in the database that will accept null values.
Don't get carried away with triggers. They have their use but can complicate things in a hurry.
Be careful with UDFs. They are great but can cause performance problems when you're not aware how often they might get called in a query.
Get Celko's book on database design. The man is arrogant but knows his stuff.
归一化,至少是第二范式,是必不可少的。
参照完整性也很重要,适当的级联删除和更新注意事项。
正确使用检查约束。让数据库做尽可能多的工作。
不要在数据库和中间层代码中分散业务逻辑。选择其中之一,最好使用中间层代码。
确定主键和集群键的一致方法。
不要过度索引。明智地选择索引。
一致的表和列命名。选择一个标准并坚持下去。
限制数据库中将接受空值的列数。
不要被触发器冲昏头脑。它们有自己的用途,但可能会很快使事情复杂化。
小心 UDF。它们很棒,但是当您不知道它们在查询中被调用的频率时会导致性能问题。
获取 Celko 关于数据库设计的书。这个人很傲慢,但知道他的东西。
回答by David Thornley
First, developers need to understand that there is something to know about databases. They're not just magic devices where you put in the SQL and get out result sets, but rather very complicated pieces of software with their own logic and quirks.
首先,开发人员需要了解有关数据库的一些知识。它们不仅仅是您放入 SQL 并获取结果集的魔法设备,而是具有自己的逻辑和怪癖的非常复杂的软件。
Second, that there are different database setups for different purposes. You do not want a developer making historical reports off an on-line transactional database if there's a data warehouse available.
其次,有不同的数据库设置用于不同的目的。如果有可用的数据仓库,您不希望开发人员根据在线事务数据库制作历史报告。
Third, developers need to understand basic SQL, including joins.
第三,开发人员需要了解基本的 SQL,包括连接。
Past this, it depends on how closely the developers are involved. I've worked in jobs where I was developer and de facto DBA, where the DBAs were just down the aisle, and where the DBAs are off in their own area. (I dislike the third.) Assuming the developers are involved in database design:
过去,这取决于开发人员的参与程度。我曾在我担任开发人员和事实上的 DBA 的工作中工作,在那里 DBA 就在过道上,而 DBA 在他们自己的领域工作。(我不喜欢第三个。)假设开发人员参与了数据库设计:
They need to understand basic normalization, at least the first three normal forms. Anything beyond that, get a DBA. For those with any experience with US courtrooms (and random television shows count here), there's the mnemonic "Depend on the key, the whole key, and nothing but the key, so help you Codd."
他们需要了解基本的规范化,至少是前三种范式。除此之外,获得DBA。对于那些在美国法庭(和随机电视节目也算在这里)有任何经验的人来说,有一个助记符“取决于钥匙,整个钥匙,除了钥匙什么都没有,所以帮助你科德。”
They need to have a clue about indexes, by which I mean they should have some idea what indexes they need and how they're likely to affect performance. This means not having useless indices, but not being afraid to add them to assist queries. Anything further (like the balance) should be left for the DBA.
他们需要对索引有所了解,我的意思是他们应该知道他们需要什么索引以及它们如何影响性能。这意味着没有无用的索引,但不要害怕添加它们来辅助查询。其他任何东西(如余额)都应该留给 DBA。
They need to understand the need for data integrity, and be able to point to where they're verifying the data and what they're doing if they find problems. This doesn't have to be in the database (where it will be difficult to issue a meaningful error message for the user), but has to be somewhere.
他们需要了解对数据完整性的需求,并能够指出他们验证数据的位置以及如果发现问题他们正在做什么。这不必在数据库中(很难向用户发出有意义的错误消息),但必须在某处。
They should have the basic knowledge of how to get a plan, and how to read it in general (at least enough to tell whether the algorithms are efficient or not).
他们应该具备如何制定计划以及如何阅读计划的基本知识(至少足以判断算法是否有效)。
They should know vaguely what a trigger is, what a view is, and that it's possible to partition pieces of databases. They don't need any sort of details, but they need to know to ask the DBA about these things.
他们应该模糊地知道什么是触发器,什么是视图,以及可以对数据库进行分区。他们不需要任何类型的细节,但他们需要知道向 DBA 询问这些事情。
They should of course know not to meddle with production data, or production code, or anything like that, and they should know that all source code goes into a VCS.
他们当然应该知道不要干预生产数据、生产代码或类似的东西,他们应该知道所有源代码都进入 VCS。
I've doubtless forgotten something, but the average developer need not be a DBA, provided there is a real DBA at hand.
毫无疑问,我忘记了一些事情,但普通开发人员不必是 DBA,只要手头有真正的 DBA。
回答by Aaronaught
Basic Indexing
基本索引
I'm always shocked to see a table or an entire database with no indexes, or arbitrary/useless indexes. Even if you're not designingthe database and just have to write some queries, it's still vital to understand, at a minimum:
看到没有索引或任意/无用索引的表或整个数据库,我总是感到震惊。即使您不是在设计数据库而只需要编写一些查询,至少理解以下内容仍然至关重要:
- What's indexed in your database and what's not:
- The difference between types of scans, how they're chosen, and how the way you write a query can influence that choice;
- The concept of coverage (why you shouldn't just write
SELECT *
); - The difference between a clustered and non-clustered index;
- Why more/bigger indexes are not necessarily better;
- Why you should try to avoid wrapping filter columns in functions.
- 什么在你的数据库中被索引,什么没有:
- 扫描类型之间的差异、它们的选择方式以及您编写查询的方式如何影响该选择;
- 覆盖的概念(为什么你不应该只写
SELECT *
); - 聚集索引和非聚集索引的区别;
- 为什么更多/更大的索引不一定更好;
- 为什么你应该尽量避免在函数中包装过滤器列。
Designers should also be aware of common index anti-patterns, for example:
设计人员还应该了解常见的索引反模式,例如:
- The Access anti-pattern (indexing every column, one by one)
- The Catch-All anti-pattern (one massive index on all or most columns, apparently created under the mistaken impression that it would speed up every conceivable query involving any of those columns).
- Access 反模式(一一索引每一列)
- Catch-All 反模式(在所有或大多数列上建立一个庞大的索引,显然是在错误的印象下创建的,即它会加速涉及任何这些列的每个可能的查询)。
The quality of a database's indexing - and whether or not you take advantage of it with the queries you write - accounts for by farthe most significant chunk of performance. 9 out of 10 questions posted on SO and other forums complaining about poor performance invariably turn out to be due to poor indexing or a non-sargable expression.
数据库索引的质量 - 以及您是否在编写查询时利用它 -是迄今为止最重要的性能块。在 SO 和其他论坛上发布的 10 个问题中,有 9 个问题抱怨性能不佳,结果总是由于索引不佳或无法表达。
回答by Aaronaught
Normalization
正常化
It always depresses me to see somebody struggling to write an excessively complicated query that would have been completely straightforward with a normalized design ("Show me total sales per region.").
看到有人努力编写一个过于复杂的查询,而如果使用标准化的设计(“显示每个地区的总销售额”),这个查询本来是完全简单的,这总是让我感到沮丧。
If you understand this at the outset and design accordingly, you'll save yourself a lot of pain later. It's easy to denormalize for performance after you've normalized; it's not so easy to normalize a database that wasn't designed that way from the start.
如果您在一开始就理解这一点并进行相应的设计,那么您以后会省去很多痛苦。规范化后很容易对性能进行非规范化;规范化一个从一开始就不是这样设计的数据库并不容易。
At the very least, you should know what 3NF is and how to get there. With most transactional databases, this is a very good balance between making queries easy to write and maintaining good performance.
至少,您应该知道 3NF 是什么以及如何到达那里。对于大多数事务性数据库,这是使查询易于编写和保持良好性能之间的很好的平衡。
回答by Markus Winand
How Indexes Work
索引如何工作
It's probably not the most important, but for sure the most underestimated topic.
这可能不是最重要的,但肯定是最被低估的话题。
The problem with indexing is that SQL tutorials usually don't mention them at all and that all the toy examples work without any index.
索引的问题在于 SQL 教程通常根本不提及它们,并且所有玩具示例都没有任何索引。
Even more experienced developers can write fairly good (and complex) SQL without knowing more about indexes than "An index makes the query fast".
甚至更有经验的开发人员也可以编写相当好的(和复杂的)SQL,而无需了解比“索引使查询更快”更多的索引。
That's because SQL databases do a very good jobworking as black-box:
那是因为 SQL 数据库作为黑盒工作做得非常好:
Tell me what you need (gimme SQL), I'll take care of it.
告诉我你需要什么(给我 SQL),我会处理的。
And that works perfectly to retrieve the correct results. The author of the SQL doesn't need to know what the system is doing behind the scenes--until everything becomes sooo slooooow.....
这非常适合检索正确的结果。SQL 的作者不需要知道系统在幕后做了什么——直到一切都变得如此缓慢......
That's when indexing becomes a topic. But that's usually very late and somebody (some company?) is already suffering from a real problem.
那是索引成为一个主题的时候。但这通常已经很晚了,有人(某家公司?)已经遇到了真正的问题。
That's why I believe indexing is the No. 1 topic not to forget when working with databases. Unfortunately, it is very easy to forget it.
这就是为什么我相信索引是使用数据库时不要忘记的第一主题。不幸的是,很容易忘记它。
Disclaimer
免责声明
The arguments are borrowed from the prefaceof my free eBook "Use The Index, Luke". I am spending quite a lot of my time explaining how indexes work and how to use them properly.
回答by FernandoZ
I just want to point out an observation - that is that it seems that the majority of responses assume database is interchangeable with relational databases. There are also object databases, flat file databases. It is important to asses the needs of the of the software project at hand. From a programmer perspective the database decision can be delayed until later. Data modeling on the other hand can be achieved early on and lead to much success.
我只想指出一个观察结果 - 即似乎大多数响应都假设数据库可以与关系数据库互换。还有对象数据库、平面文件数据库。评估手头软件项目的需求很重要。从程序员的角度来看,数据库决策可以推迟到以后。另一方面,数据建模可以在早期实现并取得很大的成功。
I think data modeling is a key component and is a relatively old concept yet it is one that has been forgotten by many in the software industry. Data modeling, especially conceptual modeling, can reveal the functional behavior of a system and can be relied on as a road map for development.
我认为数据建模是一个关键组成部分,是一个相对较旧的概念,但它已被软件行业的许多人遗忘。数据建模,尤其是概念建模,可以揭示系统的功能行为,可以作为开发路线图。
On the other hand, the type of database required can be determined based on many different factors to include environment, user volume, and available local hardware such as harddrive space.
另一方面,所需的数据库类型可以根据许多不同的因素来确定,包括环境、用户量和可用的本地硬件(例如硬盘空间)。
回答by S.Lott
Every developer should know that this is false: "Profiling a database operation is completely different from profiling code."
每个开发人员都应该知道这是错误的:“分析数据库操作与分析代码完全不同。”
There is a clear Big-O in the traditional sense. When you do an EXPLAIN PLAN
(or the equivalent) you're seeing the algorithm. Some algorithms involve nested loops and are O( n^ 2 ). Other algorithms involve B-tree lookups and are O( nlog n).
传统意义上有一个明显的大O。当你做一个EXPLAIN PLAN
(或等效的)时,你会看到算法。一些算法涉及嵌套循环并且是O( n^ 2 )。其他算法涉及 B 树查找并且是O( nlog n)。
This is very, very serious. It's central to understanding why indexes matter. It's central to understanding the speed-normalization-denormalization tradeoffs. It's central to understanding why a data warehouse uses a star-schema which is not normalized for transactional updates.
这非常非常严重。这是理解索引为何重要的核心。这是理解速度-归一化-非归一化权衡的核心。这对于理解为什么数据仓库使用未针对事务更新进行规范化的星型模式至关重要。
If you're unclear on the algorithm being used do the following. Stop. Explain the Query Execution plan. Adjust indexes accordingly.
如果您不清楚正在使用的算法,请执行以下操作。停止。解释查询执行计划。相应地调整索引。
Also, the corollary: More Indexes are Not Better.
此外,推论:更多的索引并不是更好。
Sometimes an index focused on one operation will slow other operations down. Depending on the ratio of the two operations, adding an index may have good effects, no overall impact, or be detrimental to overall performance.
有时,专注于一项操作的索引会减慢其他操作的速度。根据两种操作的比例,添加索引可能效果好,总体没有影响,或者对总体性能不利。
回答by HLGEM
Excellent question. Let's see, first no one should consider querying a datbase who does not thoroughly understand joins. That's like driving a car without knowing where the steering wheel and brakes are. You also need to know datatypes and how to choose the best one.
很好的问题。让我们看看,首先没有人应该考虑查询一个不完全了解连接的数据库。这就像在不知道方向盘和刹车在哪里的情况下驾驶汽车。您还需要了解数据类型以及如何选择最好的数据类型。
Another thing that developers should understand is that there are three things you should have in mind when designing a database:
开发人员应该了解的另一件事是,在设计数据库时,您应该牢记三件事:
Data integrity - if the data can't be relied on you essentially have no data - this means do not put required logic in the application as many other sources may touch the database. Constraints, foreign keys and sometimes triggers are necessary to data integrity. Don't fail to use them because you don't like them or don't want to be bothered to understand them.
Performance - it is very hard to refactor a poorly performing database and performance should be considered from the start. There are many ways to do the same query and some are known to be faster almost always, it is short-sighted not to learn and use these ways. Read some books on performance tuning before designing queries or database structures.
Security - this data is the life-blood of your company, it also frequently contains personal information that can be stolen. Learn to protect your data from SQL injection attacks and fraud and identity theft.
数据完整性 - 如果不能依赖数据,您基本上没有数据 - 这意味着不要在应用程序中放置所需的逻辑,因为许多其他来源可能会接触到数据库。约束、外键和有时触发器是数据完整性所必需的。不要因为不喜欢它们或不想费心去理解它们而没有使用它们。
性能 - 重构性能不佳的数据库非常困难,应从一开始就考虑性能。有很多方法可以执行相同的查询,并且已知有些方法几乎总是更快,不学习和使用这些方法是短视的。在设计查询或数据库结构之前阅读一些关于性能调优的书籍。
安全性 - 此数据是您公司的命脉,它也经常包含可能被盗的个人信息。学习保护您的数据免受 SQL 注入攻击、欺诈和身份盗用。
When querying a database, it is easy to get the wrong answer. Make sure you understand your data model thoroughly. Remember often actual decisions are made based on the data your query returns. When it is wrong, the wrong business decisions are made. You can kill a company from bad queries or loose a big customer. Data has meaning, developers often seem to forget that.
在查询数据库时,很容易得到错误的答案。确保您彻底了解您的数据模型。请记住,实际决策通常是根据您的查询返回的数据做出的。当它出错时,就会做出错误的商业决策。您可能会因糟糕的查询而扼杀一家公司或失去一个大客户。数据是有意义的,开发人员似乎经常忘记这一点。
Data almost never goes away, think in terms of storing data over time instead of just how to get it in today. That database that worked fine when it had a hundred thousand records, may not be so nice in ten years. Applications rarely last as long as data. This is one reason why designing for performance is critical.
数据几乎永远不会消失,考虑随着时间的推移存储数据,而不是今天如何获取数据。那个数据库在有十万条记录时运行良好,十年后可能不会那么好。应用程序很少能像数据一样持久。这是性能设计至关重要的原因之一。
Your database will probaly need fields that the application doesn't need to see. Things like GUIDs for replication, date inserted fields. etc. You also may need to store history of changes and who made them when and be able to restore bad changes from this storehouse. Think about how you intend to do this before you come ask a web site how to fix the problem where you forgot to put a where clause on an update and updated the whole table.
您的数据库可能需要应用程序不需要查看的字段。诸如用于复制的 GUID、插入日期的字段之类的东西。等等。您可能还需要存储更改的历史记录以及谁在何时进行更改,并能够从该仓库中恢复错误的更改。在您询问网站如何解决您忘记在更新中放置 where 子句并更新整个表的问题之前,请考虑您打算如何执行此操作。
Never develop in a newer version of a database than the production version. Never, never, never develop directly against a production database.
永远不要在比生产版本更新的数据库版本中进行开发。永远,永远,永远不要直接针对生产数据库进行开发。
If you don't have a database administrator, make sure someone is making backups and knows how to restore them and has tested restoring them.
如果您没有数据库管理员,请确保有人正在制作备份并知道如何恢复它们并已测试恢复它们。
Database code is code, there is no excuse for not keeping it in source control just like the rest of your code.
数据库代码就是代码,没有理由像其他代码一样将其保留在源代码管理中。