使用T-SQL聚合仅相邻的记录

前端之家收集整理的这篇文章主要介绍了使用T-SQL聚合仅相邻的记录前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我(在示例中简化了)包含以下数据的表
  1. Row Start Finish ID Amount
  2. --- --------- ---------- -- ------
  3. 1 2008-10-01 2008-10-02 01 10
  4. 2 2008-10-02 2008-10-03 02 20
  5. 3 2008-10-03 2008-10-04 01 38
  6. 4 2008-10-04 2008-10-05 01 23
  7. 5 2008-10-05 2008-10-06 03 14
  8. 6 2008-10-06 2008-10-07 02 3
  9. 7 2008-10-07 2008-10-08 02 8
  10. 8 2008-10-08 2008-11-08 03 19

日期表示一段时间,ID是系统在该时间段内所处的状态,金额是与该状态相关的值.

我想要做的是聚合具有相同ID号的相邻行的Amounts,但保持相同的整体序列,以便可以组合连续的运行.因此,我希望得到如下数据:

  1. Row Start Finish ID Amount
  2. --- --------- ---------- -- ------
  3. 1 2008-10-01 2008-10-02 01 10
  4. 2 2008-10-02 2008-10-03 02 20
  5. 3 2008-10-03 2008-10-05 01 61
  6. 4 2008-10-05 2008-10-06 03 14
  7. 5 2008-10-06 2008-10-08 02 11
  8. 6 2008-10-08 2008-11-08 03 19

我正在使用可以放入SP的T-sql解决方案,但是我无法通过简单查询看到如何做到这一点.我怀疑它可能需要某种迭代,但我不想走那条路.

我想要进行这种聚合的原因是该过程的下一步是按照序列中出现的唯一ID进行SUM()和Count(),这样我的最终数据将如下所示:

  1. ID Counts Total
  2. -- ------ -----
  3. 01 2 71
  4. 02 2 31
  5. 03 2 33

但是,如果我做一个简单的

  1. SELECT COUNT(ID),SUM(Amount) FROM data GROUP BY ID

在原始的桌子上,我得到了类似的东西

  1. ID Counts Total
  2. -- ------ -----
  3. 01 3 71
  4. 02 3 31
  5. 03 2 33

这不是我想要的.

解决方法

如果您在 R T Snodgrass之前阅读“在sql中开发面向时间的数据库应用程序”一书(其pdf可从他的网站上的出版物中获得),并且在p165-166上得到图6.25,您将发现非可以在当前示例中使用的简单sql,用于将具有相同ID值和连续时间间隔的各行分组.

下面的查询开发接近正确,但最后发现了一个问题,它的源代码在第一个SELECT语句中.我还没有找到为什么给出错误的答案. [如果有人可以在他们的DBMS上测试sql并告诉我第一个查询是否在那里正常工作,那将是一个很大的帮助!]

它看起来像:

  1. -- Derived from Figure 6.25 from Snodgrass "Developing Time-Oriented
  2. -- Database Applications in sql"
  3. CREATE TABLE Data
  4. (
  5. Start DATE,Finish DATE,ID CHAR(2),Amount INT
  6. );
  7.  
  8. INSERT INTO Data VALUES('2008-10-01','2008-10-02','01',10);
  9. INSERT INTO Data VALUES('2008-10-02','2008-10-03','02',20);
  10. INSERT INTO Data VALUES('2008-10-03','2008-10-04',38);
  11. INSERT INTO Data VALUES('2008-10-04','2008-10-05',23);
  12. INSERT INTO Data VALUES('2008-10-05','2008-10-06','03',14);
  13. INSERT INTO Data VALUES('2008-10-06','2008-10-07',3);
  14. INSERT INTO Data VALUES('2008-10-07','2008-10-08',8);
  15. INSERT INTO Data VALUES('2008-10-08','2008-11-08',19);
  16.  
  17. SELECT DISTINCT F.ID,F.Start,L.Finish
  18. FROM Data AS F,Data AS L
  19. WHERE F.Start < L.Finish
  20. AND F.ID = L.ID
  21. -- There are no gaps between F.Finish and L.Start
  22. AND NOT EXISTS (SELECT *
  23. FROM Data AS M
  24. WHERE M.ID = F.ID
  25. AND F.Finish < M.Start
  26. AND M.Start < L.Start
  27. AND NOT EXISTS (SELECT *
  28. FROM Data AS T1
  29. WHERE T1.ID = F.ID
  30. AND T1.Start < M.Start
  31. AND M.Start <= T1.Finish))
  32. -- Cannot be extended further
  33. AND NOT EXISTS (SELECT *
  34. FROM Data AS T2
  35. WHERE T2.ID = F.ID
  36. AND ((T2.Start < F.Start AND F.Start <= T2.Finish)
  37. OR (T2.Start <= L.Finish AND L.Finish < T2.Finish)));

查询输出是:

  1. 01 2008-10-01 2008-10-02
  2. 01 2008-10-03 2008-10-05
  3. 02 2008-10-02 2008-10-03
  4. 02 2008-10-06 2008-10-08
  5. 03 2008-10-05 2008-10-06
  6. 03 2008-10-05 2008-11-08
  7. 03 2008-10-08 2008-11-08

编辑:倒数第二行有一个问题 – 它应该不存在.而且我还不清楚(它)来自哪里.

现在我们需要将该复杂表达式视为另一个SELECT语句的FROM子句中的查询表达式,该语句将对与上面显示的最大范围重叠的条目求和给定ID的金额值.

  1. SELECT M.ID,M.Start,M.Finish,SUM(D.Amount)
  2. FROM Data AS D,(SELECT DISTINCT F.ID,L.Finish
  3. FROM Data AS F,Data AS L
  4. WHERE F.Start < L.Finish
  5. AND F.ID = L.ID
  6. -- There are no gaps between F.Finish and L.Start
  7. AND NOT EXISTS (SELECT *
  8. FROM Data AS M
  9. WHERE M.ID = F.ID
  10. AND F.Finish < M.Start
  11. AND M.Start < L.Start
  12. AND NOT EXISTS (SELECT *
  13. FROM Data AS T1
  14. WHERE T1.ID = F.ID
  15. AND T1.Start < M.Start
  16. AND M.Start <= T1.Finish))
  17. -- Cannot be extended further
  18. AND NOT EXISTS (SELECT *
  19. FROM Data AS T2
  20. WHERE T2.ID = F.ID
  21. AND ((T2.Start < F.Start AND F.Start <= T2.Finish)
  22. OR (T2.Start <= L.Finish AND L.Finish < T2.Finish)))) AS M
  23. WHERE D.ID = M.ID
  24. AND M.Start <= D.Start
  25. AND M.Finish >= D.Finish
  26. GROUP BY M.ID,M.Finish
  27. ORDER BY M.ID,M.Start;

这给出了:

  1. ID Start Finish Amount
  2. 01 2008-10-01 2008-10-02 10
  3. 01 2008-10-03 2008-10-05 61
  4. 02 2008-10-02 2008-10-03 20
  5. 02 2008-10-06 2008-10-08 11
  6. 03 2008-10-05 2008-10-06 14
  7. 03 2008-10-05 2008-11-08 33 -- Here be trouble!
  8. 03 2008-10-08 2008-11-08 19

编辑:这几乎是正确的数据集,用于执行原始问题所请求的COUNT和SUM聚合,因此最终答案是:

  1. SELECT I.ID,COUNT(*) AS Number,SUM(I.Amount) AS Amount
  2. FROM (SELECT M.ID,SUM(D.Amount) AS Amount
  3. FROM Data AS D,L.Finish
  4. FROM Data AS F,Data AS L
  5. WHERE F.Start < L.Finish
  6. AND F.ID = L.ID
  7. -- There are no gaps between F.Finish and L.Start
  8. AND NOT EXISTS
  9. (SELECT *
  10. FROM Data AS M
  11. WHERE M.ID = F.ID
  12. AND F.Finish < M.Start
  13. AND M.Start < L.Start
  14. AND NOT EXISTS
  15. (SELECT *
  16. FROM Data AS T1
  17. WHERE T1.ID = F.ID
  18. AND T1.Start < M.Start
  19. AND M.Start <= T1.Finish))
  20. -- Cannot be extended further
  21. AND NOT EXISTS
  22. (SELECT *
  23. FROM Data AS T2
  24. WHERE T2.ID = F.ID
  25. AND ((T2.Start < F.Start AND F.Start <= T2.Finish) OR
  26. (T2.Start <= L.Finish AND L.Finish < T2.Finish)))
  27. ) AS M
  28. WHERE D.ID = M.ID
  29. AND M.Start <= D.Start
  30. AND M.Finish >= D.Finish
  31. GROUP BY M.ID,M.Finish
  32. ) AS I
  33. GROUP BY I.ID
  34. ORDER BY I.ID;
  35.  
  36. id number amount
  37. 01 2 71
  38. 02 2 31
  39. 03 3 66

评论
哦! Drat … 3的条目有两倍于它应该具有的’金额’.之前的“已编辑”部分表示事情开始出错的地方.看起来好像第一个查询是巧妙的错误(可能是针对不同的问题),或者我正在使用的优化器是行为不端.然而,应该有一个与此密切相关的答案,它将给出正确的值.

对于记录:在Solaris 10上的IBM Informix Dynamic Server 11.50上进行了测试.但是,应该可以在任何其他符合标准的标准sql DBMS上正常工作.

猜你在找的MsSQL相关文章