SQL 加速大表和小表之间的内连接

Question

提问by Zaid

This may be a silly question, but it may shed some light on how joins work internally.

这可能是一个愚蠢的问题，但它可能会阐明联接在内部是如何工作的。

Let's say I have a large table Land a small table S(100K rows vs. 100 rows).

假设我有一个大表L和一个小表S（100K 行与 100 行）。

Would there be any difference in terms of speed between the following two options?:

以下两个选项在速度方面会有什么不同吗？：

OPTION 1:                 OPTION 2:
---------                 ---------
SELECT *                  SELECT *
FROM L INNER JOIN S       FROM S INNER JOIN L
ON L.id = S.id;           ON L.id = S.id;

Notice that the only difference is the order in which the tables are joined.

请注意，唯一的区别是表的连接顺序。

I realize performance may vary between different SQL languages. If so, how would MySQL compare to Access?

我意识到不同 SQL 语言之间的性能可能会有所不同。如果是这样，MySQL 与 Access 相比如何？

Answer 1

采纳答案by Mitch Wheat

No, the order does not matter.

不，顺序无关紧要。

Almost all RDBMS's (such MS Access, MySQL, SQL Server, ORACLE etc) use a cost based optimiser based upon column statistics. In most situations, the optimiser will choose a correct plan. In the example you gave, the order will not matter (provided statistics are up to date).

几乎所有 RDBMS（例如 MS Access、MySQL、SQL Server、ORACLE 等）都使用基于列统计信息的基于成本的优化器。在大多数情况下，优化器会选择正确的计划。在您给出的示例中，顺序无关紧要（前提是统计数据是最新的）。

To decide what query strategy to use, the Jet Engine optimizer uses statistics. The following factors are some of the factors that these statistics are based on:
The number of records in a table
The number of data pages in a table
The location of the table
Whether indexes are present
How unique the indexes are
Note: You cannot view Jet database engine optimization schemes, and you cannot specify how to optimize a query. However, you can use the Database Documenter to determine whether indexes are present and how unique an index is.
Based on these statistics, the Optimizer then selects the best internal query strategy for dealing with a particular query.
The statistics are updated whenever a query is compiled. A query is flagged for compiling when you save any changes to the query (or its underlying tables) and when the database is compacted. If a query is flagged for compiling, the compiling and the updating of statistics occurs the next time that the query is run. Compiling typically takes from one second to four seconds.
If you add a significant number of records to your database, you must open and then save your queries to recompile the queries. For example, if you design and then test a query by using a small set of sample data, you must re-compile the query after additional records are added to the database. When you do this, you want to make sure that optimal query performance is achieved when your application is in use.

为了决定使用什么查询策略，Jet Engine 优化器使用统计信息。以下因素是这些统计数据所基于的一些因素：
一个表中的记录数
一个表中的数据页数
桌子的位置
是否存在索引
索引的唯一性
注意：您无法查看 Jet 数据库引擎优化方案，也无法指定如何优化查询。但是，您可以使用 Database Documenter 来确定索引是否存在以及索引的唯一性。
然后，优化器会根据这些统计信息选择最佳内部查询策略来处理特定查询。
每当编译查询时都会更新统计信息。当您保存对查询（或其基础表）的任何更改以及压缩数据库时，查询被标记为编译。如果查询被标记为编译，则在下次运行查询时会进行统计信息的编译和更新。编译通常需要一秒到四秒。
如果向数据库中添加大量记录，则必须打开并保存查询以重新编译查询。例如，如果您使用一小组示例数据设计并测试查询，则必须在将其他记录添加到数据库后重新编译查询。执行此操作时，您希望确保在使用应用程序时实现最佳查询性能。

Ref.

参考。

Might be of interest: ACC: How to Optimize Queries in Microsoft Access 2.0, Microsoft Access 95, and Microsoft Access 97

可能感兴趣：ACC：如何优化 Microsoft Access 2.0、Microsoft Access 95 和 Microsoft Access 97 中的查询

Tony Toews's Microsoft Access Performance FAQis worth reading.

Tony Toews 的Microsoft Access Performance FAQ值得一读。

Answer 2

回答by Peter Lang

I know Oracle's not on your list, but I think that most modern databases will behave that way.

我知道 Oracle 不在您的列表中，但我认为大多数现代数据库都会这样做。

You can see in the following execution plan, that there is no difference between the two statements.

您可以在以下执行计划中看到，这两个语句之间没有区别。

It's a full access to each of the two tables (no index in my case), and then a HASH JOIN. Since you want everything from both tables, both tables need to be read and joined, the sequence does not have an impact.

这是对两个表中每一个的完全访问（在我的情况下没有索引），然后是HASH JOIN. 由于您需要两个表中的所有内容，因此需要读取和连接两个表，因此顺序不会产生影响。

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100 |   700 |    42  (12)| 00:00:01 |
|*  1 |  HASH JOIN         |      |   100 |   700 |    42  (12)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| S    |   100 |   300 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| L    |   100K|   390K|    38   (8)| 00:00:01 |
---------------------------------------------------------------------------

SQL 加速大表和小表之间的内连接

提问by Zaid

采纳答案by Mitch Wheat

回答by Peter Lang

相关推荐

最近更新

标签

SQL 加速大表和小表之间的内连接

提问by Zaid

采纳答案by Mitch Wheat

回答by Peter Lang

相关推荐

SQL 对表进行重复数据删除的最佳方法是什么？

SQL Spark Dataframe 嵌套 Case When 语句

在 SYS_REFCURSOR 中执行动态 sql 语句

SQL 如何在 Hive 中查找重复的行？

相关推荐

最近更新

标签