PostgreSQL解释计划中的成本测量有多可靠？

查询在具有1,100万行的大型表上执行.我已经在查询执行之前在表上执行了ANALYZE.

SELECT *
FROM accounts t1
LEFT OUTER JOIN accounts t2 
    ON (t1.account_no = t2.account_no
        AND t1.effective_date < t2.effective_date)
WHERE t2.account_no IS NULL;

解释分析：

Hash Anti Join  (cost=480795.57..1201111.40 rows=7369854 width=292) (actual time=29619.499..115662.111 rows=1977871 loops=1)
  Hash Cond: ((t1.account_no)::text = (t2.account_no)::text)
  Join Filter: ((t1.effective_date)::text < (t2.effective_date)::text)
  ->  Seq Scan on accounts t1  (cost=0.00..342610.81 rows=11054781 width=146) (actual time=0.025..25693.921 rows=11034070 loops=1)
  ->  Hash  (cost=342610.81..342610.81 rows=11054781 width=146) (actual time=29612.925..29612.925 rows=11034070 loops=1)
        Buckets: 2097152  Batches: 1  Memory Usage: 1834187kB
        ->  Seq Scan on accounts t2  (cost=0.00..342610.81 rows=11054781 width=146) (actual time=0.006..22929.635 rows=11034070 loops=1)
Total runtime: 115870.788 ms

估计费用约为120万美元,实际时间约为1.9分钟.

查询2：

SELECT t1.*
FROM accounts t1
LEFT OUTER JOIN accounts t2 
    ON (t1.account_no = t2.account_no
        AND t1.effective_date < t2.effective_date)
WHERE t2.account_no IS NULL;

解释分析：

Hash Anti Join  (cost=480795.57..1201111.40 rows=7369854 width=146) (actual time=13365.808..65519.402 rows=1977871 loops=1)
  Hash Cond: ((t1.account_no)::text = (t2.account_no)::text)
  Join Filter: ((t1.effective_date)::text < (t2.effective_date)::text)
  ->  Seq Scan on accounts t1  (cost=0.00..342610.81 rows=11054781 width=146) (actual time=0.007..5032.778 rows=11034070 loops=1)
  ->  Hash  (cost=342610.81..342610.81 rows=11054781 width=18) (actual time=13354.219..13354.219 rows=11034070 loops=1)
        Buckets: 2097152  Batches: 1  Memory Usage: 545369kB
        ->  Seq Scan on accounts t2  (cost=0.00..342610.81 rows=11054781 width=18) (actual time=0.011..8964.571 rows=11034070 loops=1)
Total runtime: 65705.707 ms

估计的成本约为120万(再次),但实际花费的时间<1.1分钟. 问题3：

SELECT *
FROM accounts
WHERE (account_no,effective_date) IN
    (SELECT account_no,max(effective_date)
     FROM accounts
     GROUP BY account_no);

解释分析：

Nested Loop  (cost=406416.19..502216.84 rows=2763695 width=146) (actual time=31779.457..917543.228 rows=1977871 loops=1)
  ->  HashAggregate  (cost=406416.19..406757.45 rows=34126 width=43) (actual time=31774.877..33378.968 rows=1977425 loops=1)
        ->  Subquery Scan on "ANY_subquery"  (cost=397884.72..404709.90 rows=341259 width=43) (actual time=27979.226..29841.217 rows=1977425 loops=1)
              ->  HashAggregate  (cost=397884.72..401297.31 rows=341259 width=18) (actual time=27979.224..29315.346 rows=1977425 loops=1)
                    ->  Seq Scan on accounts  (cost=0.00..342610.81 rows=11054781 width=18) (actual time=0.851..16092.755 rows=11034070 loops=1)
  ->  Index Scan using accounts_idx2 on accounts  (cost=0.00..2.78 rows=1 width=146) (actual time=0.443..0.445 rows=1 loops=1977425)
        Index Cond: (((account_no)::text = ("ANY_subquery".account_no)::text) AND ((effective_date)::text = "ANY_subquery".max))
Total runtime: 918039.614 ms

估计成本约为502,000,但实际花费的时间约为15.3分钟！

> EXPLAIN输出的可靠性如何？
>我们是否总是必须使用EXPLAIN ANALYZE来查看我们的查询将如何对实际数据执行,而不是信任查询计划程序认为它将花费多少？

它们是可靠的,除非它们不是.你无法真正概括.

看起来它大大低估了它会找到的不同account_no的数量(认为它会找到34126实际上找到了1977425).您的default_statistics_target可能不够高,无法对此列进行良好估算.

原文链接：/postgresql/452769.html

PostgreSQL解释计划中的成本测量有多可靠？

猜你在找的Postgre SQL相关文章