这个聚合表的计算花了几天前cca 4小时.我们的DBA将这个大连接分成3个较小的连接(每个连接4个表).临时结果每次都保存到临时表中,用于下一次连接.
DBA增强的结果是,聚合表在15分钟内计算.我想知道这怎么可能. DBA告诉我,这是因为服务器必须处理的数据量较小.换句话说,在大型原始连接中,服务器必须处理的数据多于总和较小的连接数.但是,我认为优化器会使用原始的大连接有效地完成它,自己拆分连接并仅发送下一个连接所需的列数.
他做的另一件事是他在一个临时表上创建了一个索引.但是,我再次认为优化器将根据需要创建适当的哈希表,并且更好地优化计算.
我和我们的DBA讨论了这个问题,但他自己也不确定是什么能够改善处理时间.他刚才提到,他不会责怪服务器,因为计算这么大的数据可能会让人不堪重负,并且优化器很可能很难预测最佳的执行计划…….我理解这一点,但我希望有更明确的答案.
所以,问题是:
>什么可能导致重大改善?
>将大型连接分成较小的是一个标准程序吗?
>在多个较小的连接的情况下,服务器必须处理的数据量是否真的更小?
这是原始查询:
Insert Into FinalResult_Base SELECT TC.TestCampaignContainerId,TC.CategoryId As TestCampaignCategoryId,TC.Grade,TC.TestCampaignId,T.TestSetId,TL.TestId,TSK.CategoryId,TT.[TestletId],TL.SectionNo,TL.Difficulty,TestletName = Char(65+TL.SectionNo) + CONVERT(varchar(4),6 - TL.Difficulty),TQ.[QuestionId],TS.StudentId,TS.ClassId,RA.SubjectId,TQ.[QuestionPoints],GoodAnswer = Case When TQ.[QuestionPoints] Is null Then 0 When TQ.[QuestionPoints] > 0 Then 1 Else 0 End,WrongAnswer = Case When TQ.[QuestionPoints] = 0 Then 1 When TQ.[QuestionPoints] Is null Then 1 Else 0 End,NoAnswer = Case When TQ.[QuestionPoints] Is null Then 1 Else 0 End,TS.Redizo,TT.ViewCount,TT.SpentTime,TQ.[Position],RA.SpecialNeeds,[Version] = 1,TestAdaptationId = TA.Id,TaskId = TSK.TaskId,TaskPosition = TT.Position,QuestionRate = Q.Rate,TestQuestionId = TQ.Guid,AnswerType = TT.TestletAnswerTypeId FROM [TestQuestion] TQ WITH (NOLOCK) Join [TestTask] TT WITH (NOLOCK) On TT.Guid = TQ.TestTaskId Join [Question] Q WITH (NOLOCK) On TQ.QuestionId = Q.QuestionId Join [Testlet] TL WITH (NOLOCK) On TT.TestletId = TL.Guid Join [Test] T WITH (NOLOCK) On TL.TestId = T.Guid Join [TestSet] TS WITH (NOLOCK) On T.TestSetId = TS.Guid Join [RoleAssignment] RA WITH (NOLOCK) On TS.StudentId = RA.PersonId And RA.RoleId = 1 Join [Task] TSK WITH (NOLOCK) On TSK.TaskId = TT.TaskId Join [Category] C WITH (NOLOCK) On C.CategoryId = TSK.CategoryId Join [TimeWindow] TW WITH (NOLOCK) On TW.Id = TS.TimeWindowId Join [TestAdaptation] TA WITH (NOLOCK) On TA.Id = TW.TestAdaptationId Join [TestCampaign] TC WITH (NOLOCK) On TC.TestCampaignId = TA.TestCampaignId WHERE T.TestTypeId = 1 -- eliminuji ankety And t.ProcessedOn is not null -- ne vsechny,jen dokoncene And TL.ShownOn is not null And TS.Redizo not in (999999999,111111119) END;
DBA伟大的工作后新的分裂连接:
SELECT TC.TestCampaignContainerId,6 - TL.Difficulty) -- prevod na A5,B4,B5 ...,[Version] = 1 -- ?,TL.Guid AS TLGuid,TS.TimeWindowId INTO [#FinalResult_Base_1] FROM [TestSet] [TS] WITH (NOLOCK) JOIN [Test] [T] WITH (NOLOCK) ON [T].[TestSetId] = [TS].[Guid] AND [TS].[Redizo] NOT IN (999999999,111111119) AND [T].[TestTypeId] = 1 AND [T].[ProcessedOn] IS NOT NULL JOIN [Testlet] [TL] WITH (NOLOCK) ON [TL].[TestId] = [T].[Guid] AND [TL].[ShownOn] IS NOT NULL JOIN [TimeWindow] [TW] WITH (NOLOCK) ON [TW].[Id] = [TS].[TimeWindowId] AND [TW].[IsActive] = 1 JOIN [TestAdaptation] [TA] WITH (NOLOCK) ON [TA].[Id] = [TW].[TestAdaptationId] AND [TA].[IsActive] = 1 JOIN [TestCampaign] [TC] WITH (NOLOCK) ON [TC].[TestCampaignId] = [TA].[TestCampaignId] AND [TC].[IsActive] = 1 JOIN [TestCampaignContainer] [TCC] WITH (NOLOCK) ON [TCC].[TestCampaignContainerId] = [TC].[TestCampaignContainerId] AND [TCC].[IsActive] = 1 ; SELECT FR1.TestCampaignContainerId,FR1.TestCampaignCategoryId,FR1.Grade,FR1.TestCampaignId,FR1.TestSetId,FR1.TestId,TSK.CategoryId AS [TaskCategoryId],FR1.SectionNo,FR1.Difficulty,TestletName = Char(65+FR1.SectionNo) + CONVERT(varchar(4),6 - FR1.Difficulty) -- prevod na A5,FR1.StudentId,FR1.ClassId,FR1.Redizo,FR1.TestAdaptationId,AnswerType = TT.TestletAnswerTypeId,TT.Guid AS TTGuid INTO [#FinalResult_Base_2] FROM #FinalResult_Base_1 FR1 JOIN [TestTask] [TT] WITH (NOLOCK) ON [TT].[TestletId] = [FR1].[TLGuid] JOIN [Task] [TSK] WITH (NOLOCK) ON [TSK].[TaskId] = [TT].[TaskId] AND [TSK].[IsActive] = 1 JOIN [Category] [C] WITH (NOLOCK) ON [C].[CategoryId] = [TSK].[CategoryId]AND [C].[IsActive] = 1 ; DROP TABLE [#FinalResult_Base_1] CREATE NONCLUSTERED INDEX [#IX_FR_Student_Class] ON [dbo].[#FinalResult_Base_2] ([StudentId],[ClassId]) INCLUDE ([TTGuid]) SELECT FR2.TestCampaignContainerId,FR2.TestCampaignCategoryId,FR2.Grade,FR2.TestCampaignId,FR2.TestSetId,FR2.TestId,FR2.[TaskCategoryId],FR2.[TestletId],FR2.SectionNo,FR2.Difficulty,FR2.TestletName,FR2.StudentId,FR2.ClassId,TQ.[QuestionPoints] -- 1+ good,0 wrong,null no answer,GoodAnswer = Case When TQ.[QuestionPoints] Is null Then 0 When TQ.[QuestionPoints] > 0 Then 1 -- cookie Else 0 End,FR2.Redizo,FR2.ViewCount,FR2.SpentTime,TQ.[Position] AS [QuestionPosition],RA.SpecialNeeds -- identifikace SVP,FR2.TestAdaptationId,FR2.TaskId,FR2.TaskPosition,FR2.AnswerType INTO [#FinalResult_Base] FROM [#FinalResult_Base_2] FR2 JOIN [TestQuestion] [TQ] WITH (NOLOCK) ON [TQ].[TestTaskId] = [FR2].[TTGuid] JOIN [Question] [Q] WITH (NOLOCK) ON [Q].[QuestionId] = [TQ].[QuestionId] AND [Q].[IsActive] = 1 JOIN [RoleAssignment] [RA] WITH (NOLOCK) ON [RA].[PersonId] = [FR2].[StudentId] AND [RA].[ClassId] = [FR2].[ClassId] AND [RA].[IsActive] = 1 AND [RA].[RoleId] = 1 drop table #FinalResult_Base_2; truncate table [dbo].[FinalResult_Base]; insert into [dbo].[FinalResult_Base] select * from #FinalResult_Base; drop table #FinalResult_Base;
解决方法
我不得不处理90个表连接(米老鼠设计),其中查询处理器甚至拒绝创建计划.将这种连接分解为每个9个表的10个子连接,大大降低了每个连接的复杂性,每个连接都会随着每个附加表呈指数级增长.此外,查询优化工具现在将它们视为10个计划,总体上花费(可能)更多时间(Paul White甚至可能有指标!).
中间结果表现在将具有他们自己的新统计数据,因此与深度树的统计数据相比更好地加入,这些数据库早期变得倾斜并且很快就会成为科幻小说.
此外,您可以先强制选择性最强的连接,减少向上移动树的数据量.如果您可以比Optimiser更好地估计谓词的选择性,为什么不强制使用连接顺序.可能值得寻找“浓密的计划”.
2我认为应该考虑效率和绩效是否重要
3不一定,但如果最早选择性连接是在早期执行的话