postgresql 理解 postgres 解释 w/位图堆/索引扫描
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10145037/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Understanding postgres explain w/ bitmap heap/index scans
提问by Wells
I have a table w/ 4.5 million rows. There is no primary key. The table has a column p_id
, type integer. There's an index, idx_mytable_p_id
on this column using the btree
method. I do:
我有一个有 450 万行的表。没有主键。该表有一列p_id
,类型为整数。有一个索引,idx_mytable_p_id
在这个列上使用该btree
方法。我做:
SELECT * FROM mytable WHERE p_id = 123456;
I run an explain on this and see the following output:
我对此进行了解释并看到以下输出:
Bitmap Heap Scan on mytable (cost=12.04..1632.35 rows=425 width=321)
Recheck Cond: (p_id = 543094)
-> Bitmap Index Scan on idx_mytable_p_id (cost=0.00..11.93 rows=425 width=0)
Index Cond: (p_id = 543094)
Questions:
问题:
- Why is that query doing a heap scan and then a bitmap index scan?
- Why is it examining 425 rows? Why is the width of the operation 321?
- What is the cost of 12.04..1632.35 and 0.00..11.93 telling me?
- 为什么该查询先进行堆扫描,然后再进行位图索引扫描?
- 为什么要检查 425 行?为什么操作的宽度是321?
- 12.04..1632.35 和 0.00..11.93 告诉我的成本是多少?
For the record there are 773 rows with the p_id
value of 123456. There are 38 columns on mytable
.
对于记录,有 773 行,p_id
值为 123456。在 上有 38 列mytable
。
Thanks!
谢谢!
回答by kgrittn
Why is that query doing a heap scan and then a bitmap index scan?
为什么该查询先进行堆扫描,然后再进行位图索引扫描?
It's not, exactly. EXPLAIN output shows the structure of the execution nodes, with the ones on the "higher" level (not indented as far) pulling rows from the nodes below them. So when the Bitmap Heap Scan node goes to pull its first row the Bitmap Index Scan runs to determine the set of rows to be used, and passes information on the first row to the heap scan. The index scan passes the index to determine which rows need to be read, and the heap scan actually reads them. The idea is that by reading the heap from beginning to end rather than in index order it will do less random access -- all matching rows from a given page will be read when that page is loaded, and enough pages may be read in order to use cheaper sequential access rather than seeking back and forth all over the disk.
它不是,确切地说。EXPLAIN 输出显示了执行节点的结构,“更高”级别的节点(没有缩进那么远)从它们下面的节点拉出行。所以当位图堆扫描节点去拉它的第一行时,位图索引扫描运行以确定要使用的行集,并将第一行的信息传递给堆扫描。索引扫描通过索引来确定需要读取哪些行,堆扫描实际读取它们。这个想法是,通过从头到尾而不是按索引顺序读取堆,它会减少随机访问——加载该页面时将读取来自给定页面的所有匹配行,并且可以读取足够的页面以便使用更便宜的顺序访问,而不是在整个磁盘上来回寻找。
Why is it examining 425 rows?
为什么要检查 425 行?
It's not. You ran EXPLAIN, which just shows you estimates and the chosen plan, it doesn't really examine the rows at all. That makes the value of EXPLAIN rather limited compared to running EXPLAIN ANALYZE, which actually runsthe query and shows you the estimates and the actualnumbers.
不是。你运行了 EXPLAIN,它只显示你的估计和选择的计划,它根本没有真正检查行。与运行 EXPLAIN ANALYZE 相比,这使得 EXPLAIN 的值相当有限,后者实际运行查询并向您显示估计值和实际数字。
Why is the width of the operation 321?
为什么操作的宽度是321?
Apparently that's the size, in bytes, of the tuples in mytable
.
显然,这是 中元组的大小(以字节为单位)mytable
。
What is the cost of 12.04..1632.35 and 0.00..11.93 telling me?
12.04..1632.35 和 0.00..11.93 告诉我的成本是多少?
The first number is the cost to return the first row from that node; the second number is the cost to return all of the rows for that node. Remember, these are estimates. The unit is an abstract cost unit. The absolute number means nothing; what matters in planning is which plan has the lowest cost. If you are using a cursor the first number matters; otherwise it is usually the second number. (I think it interpolates for a LIMIT clause.)
第一个数字是从该节点返回第一行的成本;第二个数字是返回该节点所有行的成本。请记住,这些是估计值。该单位是一个抽象的成本单位。绝对数字没有任何意义;计划中重要的是哪个计划的成本最低。如果您使用游标,则第一个数字很重要;否则它通常是第二个数字。(我认为它插入了一个 LIMIT 子句。)
It is often necessary to adjust configurable cost factors, such as random_page_cost
and cpu_tuple_cost
, to accurately model the costs within your environment. Without such adjustments the comparative costs are likely to not match the corresponding run times, so a less-than-optimal plan might be chosen.
通常需要调整可配置的成本因素,例如random_page_cost
和cpu_tuple_cost
,以准确地模拟您环境中的成本。如果没有这样的调整,比较成本可能与相应的运行时间不匹配,因此可能会选择一个不太理想的计划。
回答by a_horse_with_no_name
re 1) execution plans have to be read from the inner most node to the outermost node. So it's first doing an index scan (to find the rows) and the accessing the actual table to return the rows the index scan found
1)执行计划必须从最里面的节点读取到最外面的节点。所以它首先进行索引扫描(以查找行)并访问实际表以返回索引扫描找到的行
re 2) the number of rows shown in the plan is just an estimation based on the statistics and as such 425 vs. 773 sounds fairly reasonable. If you want to see realfigures, use explain analyze
2) 计划中显示的行数只是基于统计数据的估计,因此 425 与 773 听起来相当合理。如果您想查看真实数字,请使用explain analyze
re 3) the first number in the cost figure is the "startup" cost to intialize the step of the planner, the second cost is the total cost of that step.
3)成本图中的第一个数字是初始化计划器步骤的“启动”成本,第二个成本是该步骤的总成本。
This is all documented in the manual: http://www.postgresql.org/docs/current/static/using-explain.html
这都记录在手册中:http: //www.postgresql.org/docs/current/static/using-explain.html
You might want to go through these links in the PostgreSQL Wiki as well:
您可能还想浏览 PostgreSQL Wiki 中的这些链接: