SQL 哪个提供更好的性能,一个大连接或多个查询?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1932019/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Which provides better performance one big join or multiple queries?
提问by leora
i have a table called orders. one column on order is customer_id
i have a table called customers with 10 fields
我有一张桌子叫订单。订单上的一列是 customer_id
我有一个名为 customer 的表,有 10 个字段
Given the two options if i want to build up an array of order objects and embedded in an order object is a customer object i have two choices.
给定两个选项,如果我想建立一个订单对象数组并嵌入一个订单对象是一个客户对象,我有两个选择。
Option 1:
选项1:
a. first query orders table. b. loop through records and query the persons table to get the records for the person
一种。首先查询订单表。湾 遍历记录并查询人员表以获取该人员的记录
this would be something like:
这将是这样的:
Select * from APplications
Select * from Customer where id = 1
Select * from Customer where id = 2
Select * from Customer where id = 3
Select * from Customer where id = etc . . .
Option 2:
选项 2:
a. do a join on all fields
一种。对所有字段进行连接
its an obvious #2 because you are only doing one query versus 1 + [numberOforders] queries (could be hundreds or more)
这是一个明显的#2,因为您只执行一个查询而不是 1 + [numberOforders] 个查询(可能是数百个或更多)
This would be something like:
这将是这样的:
Select * from Applications a, Customers c
Innerjoin c.id = a.customerID
my main question is, what if i had 10 other tables that were off of the orders table (similar to customer) where you had the id in the order table. should you do a single query that joins these 10 tables or at some point is it inefficient do to this:
我的主要问题是,如果我有 10 个其他表不在订单表(类似于客户)中,您在订单表中拥有 id,该怎么办。您是否应该执行连接这 10 个表的单个查询,或者在某些时候这样做效率低下:
any suggestions would help.. is there any optimization to ensure fast performance
任何建议都会有所帮助..是否有任何优化以确保快速性能
采纳答案by Richard Beier
I agree with everyone who's said a single join will probably be more efficient, even with a lot of tables. It's also less development effort than doing the work in your application code. This assumes the tables are appropriately indexed, with an index on each foreign key column, and (of course) an index on each primary key column.
我同意每个人所说的单一连接可能会更有效,即使有很多表。与在应用程序代码中进行工作相比,它的开发工作量也更少。这假设表已适当索引,每个外键列上都有一个索引,(当然)每个主键列上都有一个索引。
Your best bet is to try the easiest approach (the big join) first, and see how well it performs. If it performs well, then great - you're done. If it performs poorly, profile the query and look for missing indexes on your tables.
最好的办法是先尝试最简单的方法(大连接),然后看看它的表现如何。如果它表现良好,那就太好了 - 你已经完成了。如果性能不佳,请分析查询并查找表中缺少的索引。
Your option #1 is not likely to perform well, due to the number of network round-trips (as anijhaw mentioned). This is sometimes called the "select N+1" problem - you do one SELECT to get the list of N applications, and then do N SELECTs in a loop to get the customers. This record-at-a-time looping is natural to application programmers; but SQL works much better when you operate on whole sets of data at once.
由于网络往返次数(如 anijhaw 所述),您的选项 #1 不太可能表现良好。这有时被称为“选择 N+1”问题——您执行一次 SELECT 以获取 N 个应用程序的列表,然后循环执行 N 次 SELECT 以获取客户。这种一次记录循环对应用程序员来说是很自然的。但是当您一次对整组数据进行操作时,SQL 的效果要好得多。
If option #2 is slow even with good indexing, you may want to look into caching. You can cache in the database (using a summary table or materialized/indexed view), in the application (if there is enough RAM), or in a dedicated caching server such as memcached. Of course, this depends on how up-to-date your query results need to be. If everything has to be fully up-to-date, then any cache would have to be updated whenever the underlying tables are updated - it gets complicated and becomes less useful.
如果选项 #2 即使有良好的索引也很慢,您可能需要研究缓存。您可以在数据库(使用汇总表或物化/索引视图)、应用程序(如果有足够的 RAM)或专用缓存服务器(如 memcached)中进行缓存。当然,这取决于您的查询结果需要更新到什么程度。如果一切都必须完全最新,那么无论何时更新基础表,都必须更新任何缓存 - 它变得复杂并且变得不那么有用。
This sounds like a reporting query though, and reporting often doesn't need to be real-time. So caching might be able to help you.
这听起来像是一个报告查询,而且报告通常不需要是实时的。所以缓存可能会帮助你。
Depending on your DBMS, another thing to think about is the impact of this query on other queries hitting the same database. If your DBMS allows readers to block writers, then this query could prevent updates to the tables if it takes a long time to run. That would be bad. Oracle doesn't have this problem, and neither does SQL Server when run in "read committed snapshot" mode. I don't know about MySQL though.
根据您的 DBMS,要考虑的另一件事是此查询对访问同一数据库的其他查询的影响。如果您的 DBMS 允许读取器阻止写入器,那么如果需要很长时间运行,此查询可能会阻止对表进行更新。那会很糟糕。Oracle 没有这个问题,在“读取提交的快照”模式下运行时 SQL Server 也没有这个问题。我不知道 MySQL 虽然。
回答by Peter Lang
If this customer_id is unique in your customer-table (and the other IDs are unique in the other tables), so your query only returns 1 row per Application, then doing a single SELECT is certainly more efficient.
如果这个 customer_id 在您的客户表中是唯一的(并且其他 ID 在其他表中是唯一的),那么您的查询每个应用程序只返回 1 行,那么执行单个 SELECT 肯定会更有效。
Joining all the required customers in one query will be optimized, while using lots of single SELECTs can't.
在一个查询中加入所有需要的客户将被优化,而使用大量单个 SELECT 则不能。
EDIT
I tried this with Oracle PL/SQL with 50.000 applications and 50.000 matching customers.
编辑
我在 Oracle PL/SQL 上尝试了这个,有 50.000 个应用程序和 50.000 个匹配的客户。
Solution with selecting everything in one query took0.172 s
在一个查询中选择所有内容的解决方案0.172 s
Solution with selecting every customer in a single SELECT took1.984 s
在单个 SELECT 中选择每个客户的解决方案1.984 s
And this is most likely getting worse with other clients or when accessing over network.
对于其他客户端或通过网络访问时,这很可能会变得更糟。
回答by anijhaw
Single join should be faster for two main reasons.
由于两个主要原因,单连接应该更快。
If you are querying over a network, then there is overhead in using number of queries instead of a single query.
如果您通过网络进行查询,那么使用查询数量而不是单个查询会产生开销。
A join would be optimized inside the DBMS using the query optimizer so will be faster than executing several queries.
连接将使用查询优化器在 DBMS 内进行优化,因此比执行多个查询要快。
回答by OMG Ponies
should you do a single query that joins these 10 tables or at some point is it inefficient
您是否应该执行连接这 10 个表的单个查询,或者在某些时候效率低下
All these tables join to the order - all the records returned are related. There's nothing inefficient about grabbing everything related in as few queries or operations as possible.
所有这些表都加入到订单中——所有返回的记录都是相关的。在尽可能少的查询或操作中获取相关的所有内容并没有什么低效的。
With separate queries, there's increased risk that the data may have changed between queries.
对于单独的查询,数据可能在查询之间发生更改的风险会增加。
回答by Sudhir Jonathan
The single join would still be faster, in my opinion, because a DBMS will always execute the where clauses before joins are performed. This means that before and joining happens, all the tables involved have already been cut down to the minimum possible size.
在我看来,单个连接仍然会更快,因为 DBMS 将始终在执行连接之前执行 where 子句。这意味着在连接发生之前,所有涉及的表都已经被缩减到可能的最小大小。
The fact remains that in order get what you want you will have to read from all these tables at some point of time... so doing it once will still me much more efficient.
事实仍然是,为了得到你想要的东西,你必须在某个时间点从所有这些表中读取……所以这样做一次仍然会让我更有效率。
The key here is that the tables are all cut down to the minimum size before joining, and that we're using inner joins. If both these conditions change (some outer joins are okay) then you might have problems.
这里的关键是在加入之前,表都被缩减到最小尺寸,并且我们正在使用内部联接。如果这两个条件都发生了变化(一些外部连接是可以的),那么您可能会遇到问题。