我写了一个声明,花了差不多一个小时才能运行所以我在寻求帮助,所以我可以更快地做到这一点.所以我们走了:
我正在进行两个表的内连接:
我有许多时间间隔由间隔表示,我想从这些间隔内的度量获得测量数据.
interval:有两列,一列是开始时间,另一列是间隔的结束时间(行数= 1295)
度量:有两列,一列有度量,另一列有度量的时间(行数=一百万)
我想得到的结果是一个表,在第一列中有度量,然后是度量完成的时间,所考虑的时间间隔的开始/结束时间(对于时间在所考虑范围内的行,它将重复)
这是我的代码:
select measures.measure as measure,measures.time as time,intervals.entry_time as entry_time,intervals.exit_time as exit_time from intervals inner join measures on intervals.entry_time<=measures.time and measures.time <=intervals.exit_time order by time asc
谢谢
解决方法
这是一个非常普遍的问题.
普通的B-Tree索引不适合这样的查询:
SELECT measures.measure as measure,intervals.exit_time as exit_time FROM intervals JOIN measures ON measures.time BETWEEN intervals.entry_time AND intervals.exit_time ORDER BY time ASC
索引适用于搜索给定边界内的值,如下所示:
,但不是用于搜索包含给定值的边界,如下所示:
> Adjacency list vs. nested sets: MySQL
(嵌套集模型处理类似的谓词类型).
您可以按时间建立索引,这样区间将在连接中领先,范围时间将在嵌套循环内使用.这将需要按时分类.
您可以按间隔创建空间索引(在MysqL中使用MyISAM存储可用),其中包括一个几何列中的开始和结束.这样,测量可以导致连接,并且不需要排序.
然而,空间索引更慢,所以只有少量测量但间隔很多时,这才有效.
由于您只有很少的间隔但有很多度量,因此请确保您有一个measure.time的索引:
CREATE INDEX ix_measures_time ON measures (time)
更新:
这是一个要测试的示例脚本:
BEGIN DBMS_RANDOM.seed(20091223); END; / CREATE TABLE intervals ( entry_time NOT NULL,exit_time NOT NULL ) AS SELECT TO_DATE('23.12.2009','dd.mm.yyyy') - level,TO_DATE('23.12.2009','dd.mm.yyyy') - level + DBMS_RANDOM.value FROM dual CONNECT BY level <= 1500 / CREATE UNIQUE INDEX ux_intervals_entry ON intervals (entry_time) / CREATE TABLE measures ( time NOT NULL,measure NOT NULL ) AS SELECT TO_DATE('23.12.2009','dd.mm.yyyy') - level / 720,CAST(DBMS_RANDOM.value * 10000 AS NUMBER(18,2)) FROM dual CONNECT BY level <= 1080000 / ALTER TABLE measures ADD CONSTRAINT pk_measures_time PRIMARY KEY (time) / CREATE INDEX ix_measures_time_measure ON measures (time,measure) /
这个查询:
SELECT SUM(measure),AVG(time - TO_DATE('23.12.2009','dd.mm.yyyy')) FROM ( SELECT * FROM ( SELECT /*+ ORDERED USE_NL(intervals measures) */ * FROM intervals JOIN measures ON measures.time BETWEEN intervals.entry_time AND intervals.exit_time ORDER BY time ) WHERE rownum <= 500000 )
使用NESTED LOOPS并在1.7秒内返回.
这个查询:
SELECT SUM(measure),'dd.mm.yyyy')) FROM ( SELECT * FROM ( SELECT /*+ ORDERED USE_MERGE(intervals measures) */ * FROM intervals JOIN measures ON measures.time BETWEEN intervals.entry_time AND intervals.exit_time ORDER BY time ) WHERE rownum <= 500000 )
使用MERGE JOIN,我不得不在5分钟后停止它.
更新2:
您很可能需要使用如下提示强制引擎在连接中使用正确的表顺序:
SELECT /*+ LEADING (intervals) USE_NL(intervals,measures) */ measures.measure as measure,intervals.exit_time as exit_time FROM intervals JOIN measures ON measures.time BETWEEN intervals.entry_time AND intervals.exit_time ORDER BY time ASC
Oracle的优化器不够智能,看不到间隔不相交.这就是为什么它最有可能使用度量作为主导表(如果间隔相交,这将是一个明智的决定).
更新3:
WITH splits AS ( SELECT /*+ MATERIALIZE */ entry_range,exit_range,exit_range - entry_range + 1 AS range_span,entry_time,exit_time FROM ( SELECT TRUNC((entry_time - TO_DATE(1,'J')) * 2) AS entry_range,TRUNC((exit_time - TO_DATE(1,'J')) * 2) AS exit_range,exit_time FROM intervals ) ),upper AS ( SELECT /*+ MATERIALIZE */ MAX(range_span) AS max_range FROM splits ),ranges AS ( SELECT /*+ MATERIALIZE */ level AS chunk FROM upper CONNECT BY level <= max_range ),tiles AS ( SELECT /*+ MATERIALIZE USE_MERGE (r s) */ entry_range + chunk - 1 AS tile,exit_time FROM ranges r JOIN splits s ON chunk <= range_span ) SELECT /*+ LEADING(t) USE_HASH(m t) */ SUM(LENGTH(stuffing)) FROM tiles t JOIN measures m ON TRUNC((m.time - TO_DATE(1,'J')) * 2) = tile AND m.time BETWEEN t.entry_time AND t.exit_time
此查询将时间轴拆分为范围,并使用HASH JOIN连接范围值的度量和时间戳,稍后进行精细过滤.