收集统计数据,以便oracle 11g CBO对查询计划做出更明智的决定,但不会改善查询执行时间.
sql> desc tab3; Name Null? Type ---------------------------------------------- COL1 NUMBER(38) COL2 VARCHAR2(100) COL3 VARCHAR2(36) COL4 VARCHAR2(36) COL5 VARCHAR2(4000) COL6 VARCHAR2(4000) MEASURE_0 VARCHAR2(4000) MEASURE_1 VARCHAR2(4000) MEASURE_2 VARCHAR2(4000) MEASURE_3 VARCHAR2(4000) MEASURE_4 VARCHAR2(4000) MEASURE_5 VARCHAR2(4000) MEASURE_6 VARCHAR2(4000) MEASURE_7 VARCHAR2(4000) MEASURE_8 VARCHAR2(4000) MEASURE_9 VARCHAR2(4000)
列measure_0具有40万个唯一值.
sql> select count(*) from (select measure_0 from tab3 group by measure_0 having count(*) = 1) abc; COUNT(*) ---------- 403664
以下是执行计划的查询,请注意,表上没有索引.
sql> set autotrace traceonly; sql> SELECT * FROM ( 2 SELECT 3 (ROWNUM -1) AS COL1,4 ft.COL1 AS OLD_COL1,5 ft.COL2,6 ft.COL3,7 ft.COL4,8 ft.COL5,9 ft.COL6,10 ft.MEASURE_0,11 ft.MEASURE_1,12 ft.MEASURE_2,13 ft.MEASURE_3,14 ft.MEASURE_4,15 ft.MEASURE_5,16 ft.MEASURE_6,17 ft.MEASURE_7,18 ft.MEASURE_8,19 ft.MEASURE_9 20 FROM tab3 ft 21 WHERE MEASURE_0 IN 22 ( 23 SELECT MEASURE_0 24 FROM tab3 25 GROUP BY MEASURE_0 26 HAVING COUNT(*) = 1 27 ) 28 ) ABC WHERE COL1 >= 0 AND COL1 <=449; 450 rows selected. Elapsed: 00:00:01.90 Execution Plan ---------------------------------------------------------- Plan hash value: 3115757351 ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%cpu)| Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1243 | 28M| 717K (1)| 02:23:29 | |* 1 | VIEW | | 1243 | 28M| 717K (1)| 02:23:29 | | 2 | COUNT | | | | | | |* 3 | HASH JOIN | | 1243 | 30M| 717K (1)| 02:23:29 | | 4 | VIEW | VW_NSO_1 | 1686K| 3219M| 6274 (2)| 00:01:16 | |* 5 | FILTER | | | | | | | 6 | HASH GROUP BY | | 1 | 3219M| 6274 (2)| 00:01:16 | | 7 | TABLE ACCESS FULL| TAB3 | 1686K| 3219M| 6196 (1)| 00:01:15 | | 8 | TABLE ACCESS FULL | TAB3 | 1686K| 37G| 6211 (1)| 00:01:15 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("COL1">=0 AND "COL1"<=449) 3 - access("MEASURE_0"="MEASURE_0") 5 - filter(COUNT(*)=1) Note ----- - dynamic sampling used for this statement (level=2) Statistics ---------------------------------------------------------- 354 recursive calls 0 db block gets 46518 consistent gets 45122 physical reads 0 redo size 43972 bytes sent via sql*Net to client 715 bytes received via sql*Net from client 31 sql*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 450 rows processed
查询占用1.90秒.如果我再次运行查询,则需要1.66秒.为什么在第一次运行中需要更多时间?
sql> create index ind_tab3_orgid on tab3(COL1); Index created. Elapsed: 00:00:01.68 sql> create index ind_tab3_msr_0 on tab3(measure_0); Index created. Elapsed: 00:00:01.83
当我第一次发出这个查询之后,花了21秒才回来.而后续的运行则是2.9秒.为什么oracle在第一次运行中花费了这么多时间,是暖身还是什么东西呢?挡住了我!
这是计划,当它需要2.9秒 –
450 rows selected. Elapsed: 00:00:02.92 Execution Plan ---------------------------------------------------------- Plan hash value: 240271480 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%cpu)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1243 | 28M| 711K (1)| 02:22:15 | |* 1 | VIEW | | 1243 | 28M| 711K (1)| 02:22:15 | | 2 | COUNT | | | | | | | 3 | NESTED LOOPS | | | | | | | 4 | NESTED LOOPS | | 1243 | 30M| 711K (1)| 02:22:15 | | 5 | VIEW | VW_NSO_1 | 1686K| 3219M| 6274 (2)| 00:01:16 | |* 6 | FILTER | | | | | | | 7 | HASH GROUP BY | | 1 | 3219M| 6274 (2)| 00:01:16 | | 8 | TABLE ACCESS FULL | TAB3 | 1686K| 3219M| 6196 (1)| 00:01:15 | |* 9 | INDEX RANGE SCAN | IND_TAB3_MSR_0 | 1243 | | 2 (0)| 00:00:01 | | 10 | TABLE ACCESS BY INDEX ROWID| TAB3 | 1243 | 28M| 44 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("COL1">=0 AND "COL1"<=449) 6 - filter(COUNT(*)=1) 9 - access("MEASURE_0"="MEASURE_0") Note ----- - dynamic sampling used for this statement (level=2) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 660054 consistent gets 22561 physical reads 0 redo size 44358 bytes sent via sql*Net to client 715 bytes received via sql*Net from client 31 sql*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 450 rows processed
我期待的时间要低于桌子未被索引的时间.为什么表的索引版本比非索引版本需要更多的时间来获取结果?如果我没有错,那是正在占用时间的TABLE ACCESS BY INDEX ROWID.我可以强制执行oracle来使用TABLE ACCESS FULL吗?
然后我收集了表上的统计数据,以便CBO通过计算选项来改进计划.所以现在的统计数字是准确的.
sql> EXECUTE dbms_stats.gather_table_stats (ownname=>'EQUBE67DP',tabname=>'TAB3',estimate_percent=>null,cascade=>true); PL/sql procedure successfully completed. Elapsed: 00:01:02.47 sql> set autotrace off; sql> select COLUMN_NAME,NUM_DISTINCT,SAMPLE_SIZE,HISTOGRAM,LAST_ANALYZED from dba_tab_cols where table_name = 'TAB3' ; COLUMN_NAME NUM_DISTINCT SAMPLE_SIZE HISTOGRAM LAST_ANALYZED ------------------------------ ------------ ----------- --------------- --------- COL1 1502257 1502257 NONE 27-JUN-12 COL2 0 NONE 27-JUN-12 COL3 1 1502257 NONE 27-JUN-12 COL4 0 NONE 27-JUN-12 COL5 1502257 1502257 NONE 27-JUN-12 COL6 1502257 1502257 NONE 27-JUN-12 MEASURE_0 405609 1502257 HEIGHT BALANCED 27-JUN-12 MEASURE_1 128570 1502257 NONE 27-JUN-12 MEASURE_2 1502257 1502257 NONE 27-JUN-12 MEASURE_3 185657 1502257 NONE 27-JUN-12 MEASURE_4 901 1502257 NONE 27-JUN-12 MEASURE_5 17 1502257 NONE 27-JUN-12 MEASURE_6 2202 1502257 NONE 27-JUN-12 MEASURE_7 2193 1502257 NONE 27-JUN-12 MEASURE_8 21 1502257 NONE 27-JUN-12 MEASURE_9 27263 1502257 NONE 27-JUN-12
我再次运行查询
450 rows selected. Elapsed: 00:00:02.95 Execution Plan ---------------------------------------------------------- Plan hash value: 240271480 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%cpu)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 31M| 718G| 8046 (2)| 00:01:37 | |* 1 | VIEW | | 31M| 718G| 8046 (2)| 00:01:37 | | 2 | COUNT | | | | | | | 3 | NESTED LOOPS | | | | | | | 4 | NESTED LOOPS | | 31M| 62G| 8046 (2)| 00:01:37 | | 5 | VIEW | VW_NSO_1 | 4057 | 7931K| 6263 (2)| 00:01:16 | |* 6 | FILTER | | | | | | | 7 | HASH GROUP BY | | 1 | 20285 | 6263 (2)| 00:01:16 | | 8 | TABLE ACCESS FULL | TAB3 | 1502K| 7335K| 6193 (1)| 00:01:15 | |* 9 | INDEX RANGE SCAN | IND_TAB3_MSR_0 | 4 | | 2 (0)| 00:00:01 | | 10 | TABLE ACCESS BY INDEX ROWID| TAB3 | 779K| 75M| 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("COL1">=0 AND "COL1"<=449) 6 - filter(COUNT(*)=1) 9 - access("MEASURE_0"="MEASURE_0") Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 660054 consistent gets 22561 physical reads 0 redo size 44358 bytes sent via sql*Net to client 715 bytes received via sql*Net from client 31 sql*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 450 rows processed
这次查询在2.9秒内回来了(有时也需要3.9秒).
我的目标是尽可能减少查询执行时间.但是在添加索引或计算统计信息后,查询时间不断增加.为什么会发生这种情况,即使保持索引也可以改进?
解决方法
just keep saying to yourself over and over
“full scans are not evil,indexes are not good”
“full scans are not evil,indexes are not good”
索引不会总是提高性能,它们不是魔法银弹(好像这样的事情存在:)
现在你问你的索引为什么需要更长的时间.答案很简单:
>全表扫描:46518一致得到
>与你的索引:660054一致得到
换句话说:Oracle使用索引执行更多的读取操作,而不是全表扫描.这是因为:
>全表扫描读取是批量操作(一次有多个块),因此是读取大量数据的有效方法
>有时当您从索引中读取时,您最终会多次读取完全相同的数据块.
至于优化器为什么选择使用这个非常有效的索引,这是因为即使使用esimate_percent = 100和完整的直方图(您在MEASURE_0列上收集的),一些数据分布仍然无法通过简单的分析来可靠地表达的优化器.特别地,分析器不能很好地理解交叉列和跨表依赖性.这导致错误的估计,导致计划选择不佳.
编辑:似乎CBO的工作假设对于这个自连接来说根本不工作(你最后一个查询预计有31万行,而只有450个被选中!).这是令人困惑的,因为表只有1.5 M行.你使用什么版本的Oracle?
SELECT * FROM ( SELECT (ROWNUM -1) AS COL1,ABC.* FROM ( SELECT ft.COL1 AS OLD_COL1,[...],COUNT(*) OVER (PARTITION BY MEASURE_O) nb_0 FROM tab3 ft ) ABC WHERE nb_0 = 1 AND ROWNUM - 1 <= 449 ) v WHERE COL1 >= 0;
你也问为什么第一次查询运行需要更多的时间.这是因为在工作中有缓存.在数据库级别,存在SGA,其中所有块首先从磁盘复制,然后可以多次读取(第一次查询块始终是物理读取).那么一些系统还有一个独立的系统缓存,如果最近读过数据缓存,将会更快地返回数据.
进一步阅读:
> explication of a data discrepancy that can lead to wrong estimates和涉及sql配置文件的解决方案.