MySQL JOIN 查询与多个查询
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1067016/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
JOIN queries vs multiple queries
提问by Thomas Bonini
Are JOIN queries faster than several queries? (You run your main query, and then you run many other SELECTs based on the results from your main query)
JOIN 查询是否比多个查询快?(您运行主查询,然后根据主查询的结果运行许多其他 SELECT)
I'm asking because JOINing them would complicate A LOT the design of my application
我问是因为加入它们会使我的应用程序设计复杂化
If they are faster, can anyone approximate very roughly by how much? If it's 1.5x I don't care, but if it's 10x I guess I do.
如果它们更快,任何人都可以粗略地估计出多少?如果是 1.5 倍我不在乎,但如果是 10 倍我想我会。
采纳答案by Paolo Bergantino
This is way too vague to give you an answer relevant to your specific case. It depends on a lot of things. Jeff Atwood (founder of this site) actually wrote about this. For the most part, though, if you have the right indexes and you properly do your JOINs it is usually going to be faster to do 1 trip than several.
这太模糊了,无法为您提供与您的具体案例相关的答案。这取决于很多事情。Jeff Atwood(本网站的创始人)实际上写了这篇文章。但是,在大多数情况下,如果您拥有正确的索引并且正确执行 JOIN,那么执行 1 次旅行通常比执行多次旅行要快。
回答by Frank Forte
For inner joins, a single query makes sense, since you only get matching rows. For left joins, multiple queries is much better... look at the following benchmark I did:
对于内部连接,单个查询是有意义的,因为您只能获得匹配的行。对于左连接,多个查询要好得多……看看我所做的以下基准测试:
Single query with 5 Joins
query: 8.074508 seconds
result size: 2268000
5 queries in a row
combined query time: 0.00262 seconds
result size: 165 (6 + 50 + 7 + 12 + 90)
具有 5 个连接的单个查询
查询:8.074508 秒
结果大小:2268000
连续 5 次查询
组合查询时间:0.00262秒
结果大小:165(6 + 50 + 7 + 12 + 90)
.
.
Note that we get the same results in both cases (6 x 50 x 7 x 12 x 90 = 2268000)
请注意,我们在两种情况下都得到了相同的结果(6 x 50 x 7 x 12 x 90 = 2268000)
left joins use exponentially more memory with redundant data.
左连接使用具有冗余数据的指数级更多内存。
The memory limit might not be as bad if you only do a join of two tables, but generally three or more and it becomes worth different queries.
如果您只连接两个表,内存限制可能不会那么糟糕,但通常是三个或更多,并且值得不同的查询。
As a side note, my MySQL server is right beside my application server... so connection time is negligible. If your connection time is in the seconds, then maybe there is a benefit
作为旁注,我的 MySQL 服务器就在我的应用程序服务器旁边......所以连接时间可以忽略不计。如果您的连接时间以秒为单位,那么也许有好处
Frank
坦率
回答by Valentin Flachsel
I actually came to this question looking for an answer myself, and after reading the given answers I can only agree that the best way to compare DB queries performance is to get real-world numbers because there are just to many variables to be taken into account BUT, I also think that comparing the numbers between them leads to no good in almost all cases. What I mean is that the numbers should always be compared with an acceptable number and definitely not compared with each other.
我实际上来到这个问题是为了自己寻找答案,在阅读了给定的答案后,我只能同意比较数据库查询性能的最佳方法是获得真实世界的数字,因为需要考虑的变量很多但是,我也认为比较它们之间的数字在几乎所有情况下都没有好处。我的意思是,数字应该始终与可接受的数字进行比较,绝对不能相互比较。
I can understand if one way of querying takes say 0.02 seconds and the other one takes 20 seconds, that's an enormous difference. But what if one way of querying takes 0.0000000002 seconds, and the other one takes 0.0000002 seconds ? In both cases one way is a whopping 1000 times faster than the other one, but is it reallystill "whopping" in the second case ?
我可以理解,如果一种查询方式需要 0.02 秒,而另一种需要 20 秒,那么这是一个巨大的差异。但是如果一种查询方式需要 0.0000000002 秒,而另一种方式需要 0.0000002 秒呢?在这两种情况下,一种方式比另一种方式快 1000 倍,但在第二种情况下它真的仍然“惊人”吗?
Bottom line as I personally see it: if it performs well, go for the easy solution.
我个人认为的底线是:如果它表现良好,请寻找简单的解决方案。
回答by Benjamin
This question is old, but is missing some benchmarks. I benchmarked JOIN against its 2 competitors:
这个问题很老,但缺少一些基准。我将 JOIN 与它的两个竞争对手进行了对比:
- N+1 queries
- 2 queries, the second one using a
WHERE IN(...)
or equivalent
- N+1 查询
- 2 个查询,第二个使用 a
WHERE IN(...)
或等效项
The result is clear: on MySQL, JOIN
is muchfaster. N+1 queries can drop the performance of an application drastically:
其结果是显而易见的:在MySQL,JOIN
是多快。N+1 查询会显着降低应用程序的性能:
That is, unless you select a lot of records that point to a very small number of distinct, foreign records. Here is a benchmark for the extreme case:
也就是说,除非您选择大量指向极少数不同的外部记录的记录。这是极端情况的基准:
This is very unlikely to happen in a typical application, unless you're joining a -to-many relationship, in which case the foreign key is on the other table, and you're duplicating the main table data many times.
这在典型的应用程序中不太可能发生,除非您加入的是多对多关系,在这种情况下,外键在另一个表上,并且您多次复制主表数据。
Takeaway:
带走:
- For *-to-one relationships, always use
JOIN
- For *-to-many relationships, a second query mightbe faster
- 对于 * 对一的关系,请始终使用
JOIN
- 对于 * 对多关系,第二次查询可能会更快
See my article on Mediumfor more information.
有关更多信息,请参阅我在 Medium 上的文章。
回答by levans
Did a quick test selecting one row from a 50,000 row table and joining with one row from a 100,000 row table. Basically looked like:
做了一个快速测试,从 50,000 行表中选择一行,并从 100,000 行表中加入一行。基本上看起来像:
$id = mt_rand(1, 50000);
$row = $db->fetchOne("SELECT * FROM table1 WHERE id = " . $id);
$row = $db->fetchOne("SELECT * FROM table2 WHERE other_id = " . $row['other_id']);
vs
对比
$id = mt_rand(1, 50000);
$db->fetchOne("SELECT table1.*, table2.*
FROM table1
LEFT JOIN table1.other_id = table2.other_id
WHERE table1.id = " . $id);
The two select method took 3.7 seconds for 50,000 reads whereas the JOIN took 2.0 seconds on my at-home slow computer. INNER JOIN and LEFT JOIN did not make a difference. Fetching multiple rows (e.g., using IN SET) yielded similar results.
两个 select 方法花费 3.7 秒读取 50,000 次,而 JOIN 在我家里的慢速计算机上花费 2.0 秒。INNER JOIN 和 LEFT JOIN 没有区别。获取多行(例如,使用 IN SET)产生了类似的结果。
回答by HoldOffHunger
The real question is: Do these records have a one-to-one relationshipor a one-to-many relationship?
真正的问题是:这些记录是一对一关系还是一对多关系?
TLDR Answer:
TLDR 答案:
If one-to-one, use a JOIN
statement.
如果是一对一,请使用JOIN
语句。
If one-to-many, use one (or many) SELECT
statements with server-side code optimization.
如果是一对多,请使用一个(或多个)SELECT
语句进行服务器端代码优化。
Why and How To Use SELECT for Optimization
为什么以及如何使用 SELECT 进行优化
SELECT
'ing (with multiple queries instead of joins) on large group of records based on a one-to-many relationship produces an optimal efficiency, as JOIN
'ing has an exponential memory leak issue. Grab all of the data, then use a server-side scripting language to sort it out:
SELECT
'ing(使用多个查询而不是连接)对基于一对多关系的大记录组产生最佳效率,因为JOIN
'ing 具有指数级内存泄漏问题。抓取所有数据,然后使用服务器端脚本语言进行整理:
SELECT * FROM Address WHERE Personid IN(1,2,3);
Results:
结果:
Address.id : 1 // First person and their address
Address.Personid : 1
Address.City : "Boston"
Address.id : 2 // First person's second address
Address.Personid : 1
Address.City : "New York"
Address.id : 3 // Second person's address
Address.Personid : 2
Address.City : "Barcelona"
Here, I am getting all of the records, in one select statement. This is better than JOIN
, which would be getting a small group of these records, one at a time, as a sub-component of another query. Then I parse it with server-side code that looks something like...
在这里,我在一个 select 语句中获取所有记录。这比JOIN
,一次一个获取一小组这些记录作为另一个查询的子组件要好。然后我用服务器端代码解析它,看起来像......
<?php
foreach($addresses as $address) {
$persons[$address['Personid']]->Address[] = $address;
}
?>
When Not To Use JOIN for Optimization
何时不使用 JOIN 进行优化
JOIN
'ing a large group of records based on a one-to-one relationship with one single record produces an optimal efficiency compared to multiple SELECT
statements, one after the other, which simply get the next record type.
JOIN
与一个接一个的多个SELECT
语句相比,基于与单个记录的一对一关系来处理一大组记录会产生最佳效率,这些语句只是简单地获取下一个记录类型。
But JOIN
is inefficient when getting records with a one-to-many relationship.
但是JOIN
在获取具有一对多关系的记录时效率低下。
Example: The database Blogs has 3 tables of interest, Blogpost, Tag, and Comment.
示例:数据库 Blogs 有 3 个感兴趣的表,Blogpost、Tag 和 Comment。
SELECT * from BlogPost
LEFT JOIN Tag ON Tag.BlogPostid = BlogPost.id
LEFT JOIN Comment ON Comment.BlogPostid = BlogPost.id;
If there is 1 blogpost, 2 tags, and 2 comments, you will get results like:
如果有 1 篇博文、2 个标签和 2 条评论,您将得到如下结果:
Row1: tag1, comment1,
Row2: tag1, comment2,
Row3: tag2, comment1,
Row4: tag2, comment2,
Notice how each record is duplicated. Okay, so, 2 comments and 2 tags is 4 rows. What if we have 4 comments and 4 tags? You don't get 8 rows -- you get 16 rows:
注意每条记录是如何复制的。好的,所以,2 条评论和 2 个标签是 4 行。如果我们有 4 个评论和 4 个标签怎么办?你没有得到 8 行——你得到 16 行:
Row1: tag1, comment1,
Row2: tag1, comment2,
Row3: tag1, comment3,
Row4: tag1, comment4,
Row5: tag2, comment1,
Row6: tag2, comment2,
Row7: tag2, comment3,
Row8: tag2, comment4,
Row9: tag3, comment1,
Row10: tag3, comment2,
Row11: tag3, comment3,
Row12: tag3, comment4,
Row13: tag4, comment1,
Row14: tag4, comment2,
Row15: tag4, comment3,
Row16: tag4, comment4,
Add more tables, more records, etc., and the problem will quickly inflate to hundreds of rows that are all full of mostlyredundant data.
添加更多表、更多记录等,问题将迅速膨胀到数百行,这些行都充满了大部分冗余数据。
What do these duplicates cost you? Memory (in the SQL server and the code that tries to remove the duplicates) and networking resources (between SQL server and your code server).
这些重复的代价是什么?内存(在 SQL 服务器和尝试删除重复项的代码中)和网络资源(在 SQL 服务器和您的代码服务器之间)。
Source: https://dev.mysql.com/doc/refman/8.0/en/nested-join-optimization.html; https://dev.mysql.com/doc/workbench/en/wb-relationship-tools.html
来源:https: //dev.mysql.com/doc/refman/8.0/en/nested-join-optimization.html;https://dev.mysql.com/doc/workbench/en/wb-relationship-tools.html
回答by DreadPirateShawn
Construct both separate queries and joins, then time each of them -- nothing helps more than real-world numbers.
构建单独的查询和连接,然后为每个查询计时——没有什么比真实世界的数字更有帮助了。
Then even better -- add "EXPLAIN" to the beginning of each query. This will tell you how many subqueries MySQL is using to answer your request for data, and how many rows scanned for each query.
然后更好——在每个查询的开头添加“EXPLAIN”。这将告诉您 MySQL 使用多少个子查询来回答您的数据请求,以及为每个查询扫描了多少行。
回答by glasnt
Depending on the complexity for the database compared to developer complexity, it may be simpler to do many SELECT calls.
根据数据库的复杂性与开发人员的复杂性相比,执行许多 SELECT 调用可能更简单。
Try running some database statistics against both the JOIN and the multiple SELECTS. See if in your environment the JOIN is faster/slower than the SELECT.
尝试针对 JOIN 和多个 SELECTS 运行一些数据库统计信息。查看在您的环境中 JOIN 是否比 SELECT 快/慢。
Then again, if changing it to a JOIN would mean an extra day/week/month of dev work, I'd stick with multiple SELECTs
再说一次,如果将其更改为 JOIN 意味着额外的一天/一周/一个月的开发工作,我会坚持使用多个 SELECT
Cheers,
干杯,
BLT
BLT
回答by A Boy Named Su
In my experience I have found it's usually faster to run several queries, especially when retrieving large data sets.
根据我的经验,我发现运行多个查询通常会更快,尤其是在检索大型数据集时。
When interacting with the database from another application, such as PHP, there is the argument of one trip to the server over many.
当从另一个应用程序(例如 PHP)与数据库交互时,存在一次到服务器多次的争论。
There are other ways to limit the number of trips made to the server and still run multiple queries that are often not only faster but also make the application easier to read - for example mysqli_multi_query.
还有其他方法可以限制访问服务器的次数并仍然运行多个查询,这些查询通常不仅速度更快,而且使应用程序更易于阅读 - 例如 mysqli_multi_query。
I'm no novice when it comes to SQL, I think there is a tendency for developers, especially juniors to spend a lot of time trying to write very clever joins because they look smart, whereas there are actually smart ways to extract data that look simple.
我对 SQL 不是新手,我认为开发人员有一种倾向,尤其是初级开发人员会花大量时间尝试编写非常聪明的连接,因为它们看起来很聪明,而实际上有一些聪明的方法可以提取看起来很聪明的数据简单的。
The last paragraph was a personal opinion, but I hope this helps. I do agree with the others though who say you should benchmark. Neither approach is a silver bullet.
最后一段是个人意见,但我希望这会有所帮助。我确实同意其他人的看法,尽管他们说您应该进行基准测试。这两种方法都不是灵丹妙药。
回答by cHao
Whether you should use a join is first and foremost about whether a join makes sense. Only at that point is performance even something to be considered, as nearly all other cases will result in significantly worseperformance.
是否应该使用联接首先是关于联接是否有意义。只有在这一点上是表现甚至要考虑的事情,因为几乎所有其他情况下会导致显著恶化的表现。
Performance differences will largely be tied to how related the info you're querying for is. Joins work, and they're fast when the data is relatedand you index stuff correctly, but they do often result in some redundancy and sometimes more results than needed. And if your data sets are not directly related, sticking them in a single query will result in what's called a Cartesian product (basically, all possible combinations of rows), which is almost never what you want.
性能差异在很大程度上取决于您查询的信息的相关性。联接工作,并且当数据相关并且您正确索引内容时它们很快,但它们确实经常导致一些冗余并且有时比需要的结果更多。如果您的数据集不直接相关,将它们放在单个查询中将导致所谓的笛卡尔积(基本上,所有可能的行组合),这几乎不是您想要的。
This is often caused by many-to-one-to-many relationships. For example, HoldOffHunger's answermentioned a single query for posts, tags, and comments. Comments are related to a post, as are tags...but tags are unrelated to comments.
这通常是由多对一多关系引起的。例如,HoldOffHunger 的回答提到了对帖子、标签和评论的单个查询。评论与帖子相关,标签也是如此……但标签与评论无关。
+------------+ +---------+ +---------+
| comment | | post | | tag |
|------------|* 1|---------|1 *|---------|
| post_id |-----| post_id |-----| post_id |
| comment_id | | ... | | tag_id |
| user_id | | | | ... |
| ... | | | | ... |
+------------+ +---------+ +---------+
In this case, it is unambiguously better for this to be at least two separate queries. If you try to join tags and comments, because there's no direct relation between the two, you end up with every possible combination of tag and comment. many * many == manymany
. Aside from that, since posts and tags are unrelated, you can do those two queries in parallel, leading to potential gain.
在这种情况下,最好是至少两个单独的查询。如果您尝试连接标签和评论,因为两者之间没有直接关系,您最终会得到标签和评论的所有可能组合。many * many == manymany
. 除此之外,由于帖子和标签不相关,您可以并行执行这两个查询,从而获得潜在收益。
Let's consider a different scenario, though: You want the comments attached to a post, and the commenters' contact info.
不过,让我们考虑一个不同的场景:您希望将评论附加到帖子中,以及评论者的联系信息。
+----------+ +------------+ +---------+
| user | | comment | | post |
|----------|1 *|------------|* 1|---------|
| user_id |-----| post_id |-----| post_id |
| username | | user_id | | ... |
| ... | | ... | +---------+
+----------+ +------------+
This is where you should consider a join. Aside from being a much more natural query, most database systems (including MySQL) have lots of smart people put lots of hard work into optimizing queries just like it. For separate queries, since each query depends on the results of the previous one, the queries can't be done in parallel, and the total time becomes not just the actual execute time of the queries, but also the time spent fetching results, sifting through them for IDs for the next query, linking rows together, etc.
这是您应该考虑加入的地方。除了是一个更自然的查询之外,大多数数据库系统(包括 MySQL)都有很多聪明的人在优化查询方面付出了很多努力。对于单独的查询,由于每个查询都依赖于前一个查询的结果,查询无法并行完成,总时间变成不仅仅是查询的实际执行时间,还有获取结果所花费的时间,筛选通过它们获取下一个查询的 ID,将行链接在一起等。