Postgresql 数据在一张表中存在，另一张表不满足完整性的查找

有两张表T1，T2，表结构和数据如下：

create table t1 (
  id int
);

create table t2 (
  id int,finished int
);

insert into t1 values (1);
insert into t1 values (2);
insert into t1 values (3);
insert into t1 values (4);
insert into t1 values (5);
insert into t1 values (6);

insert into t2 values (1,1);
insert into t2 values (2,0);
insert into t2 values (3,1);
insert into t2 values (4,0);

SQL Fiddle 可以测试sql语句的执行。
想要实现T1中存在前提下，T2中不存在或者finished=0，也就是查询结果是：2,4,5,6.

一、性能测试：

1. not in:

explain (analyze,verbose,costs,buffers) select ID from T1 where ID not in (select ID from T2 where finished=1);

Total runtime: 0.128 ms

2. not exists:

explain (analyze,buffers) select ID from T1 where not exists (select 1 from T2 where T1.ID=T2.ID and T2.finished=1);

Total runtime: 0.105 ms

3. left join:

explain (analyze,buffers) select T1.ID from T1 left join T2 on T1.ID=T2.ID and T2.finished=1 where T2.ID is null;

Total runtime: 0.096 ms

4. 网上还看到一种更快方法，但测试下来此方法不对，所以不讨论：
select ID from T2 where (select count(1) from T1 where T1.ID=T2.ID) = 0; 这条语句查询结果为空

因此在postgresql 9.3 上语句执行速度 left join > not exists > not in
当T1和T2表中ID出现null时，not in 语句会有不同的表现，所以推荐总是用not exists 代替 not in.

二、大数据量性能测试：

在大量数据的时候，not in有严重性能下降的问题，下面是我在i5 2.4GHz MAC pro 13吋上的测试。
department(T1) 为59280条数据，数据长度29字符；dept(T2) 为23633条数据，数据长度29字符。

1. explain analyze select department.id from department where department.id not in (select id from dept where finished=true);
Total runtime: 447073.065 ms

2. explain analyze select department.id from department where not exists (select 1 from dept where department.id=dept.id and finished=true);
Total runtime: 325.732 ms

3. explain analyze select department.id from department left join dept on department.id=dept.id and dept.finished=true where dept.id is null;
Total runtime: 319.869 ms

三、总结：

在Postgresql 9.3上：
not in 不仅性能差，而且逻辑可能有问题。
not exists 性能不错，思考起来比较容易。
left join 性能最好，但总体跟not exists 比也快不了多少，思考稍微有点绕。

下面是一张网上的left join 的图，但找不到出处，有助于理解 left join 的过程：

原文链接：https://www.f2er.com/postgresql/195441.html

Postgresql 数据在一张表中存在，另一张表不满足完整性的查找

一、性能测试：

二、大数据量性能测试：

三、总结：

猜你在找的Postgre SQL相关文章