您可以安全地忽略以下部分(包括)JOIN:如果您只想破解代码,请启动Off.背景和结果只是作为背景.如果您想查看最初的代码,请查看2015-10-06之前的编辑历史记录.
目的
最后,我想基于表SecondTable中可用GPS数据的DateTime标记计算发射器(X或Xmit)的插值GPS坐标,该表位于表FirstTable中的观察点的正上方.
我实现最终目标的直接目标是弄清楚如何最好地将FirstTable加入SecondTable以获得那些侧翼时间点.后来我可以使用那些信息我可以计算中间GPS坐标,假设沿着等角坐标系进行线性拟合(花哨的话说我不关心地球是这个尺度的球体).
问题
>通过抓住我自己修复
“之后”,然后获得“之前”仅与之相关
“后”.
>是否存在不涉及(A> B OR A = B)结构的更直观的方式.
> Byrdzeye提供了基本的替代品,
然而,我的“现实世界”体验与他的所有4个人并不相符
加入执行相同的策略.但他完全赞不绝口
解决替代连接样式.
>您可能有任何其他想法,技巧和建议.
>因此,byrdzeye和Phrancis在这方面都非常有帮助.一世
发现Phrancis’ advice出色地布置和提供
在关键阶段提供帮助,所以我会在这里给他优势.
对于问题3,我仍然希望得到任何额外的帮助.
要点反映了我认为在个别问题上对我帮助最大的人.
表定义
半视觉表现
FirstTable
Fields RecTStamp | DateTime --can contain milliseconds via VBA code (see Ref 1) ReceivID | LONG XmitID | TEXT(25) Keys and Indices PK_DT | Primary,Unique,No Null,Compound XmitID | ASC RecTStamp | ASC ReceivID | ASC UK_DRX | Unique,Compound RecTStamp | ASC ReceivID | ASC XmitID | ASC
SecondTable
Fields X_ID | LONG AUTONUMBER -- seeded after main table has been created and already sorted on the primary key XTStamp | DateTime --will not contain partial seconds Latitude | Double --these are in decimal degrees,not degrees/minutes/seconds Longitude | Double --this way straight decimal math can be performed Keys and Indices PK_D | Primary,Simple XTStamp | ASC UIDX_ID | Unique,Simple X_ID | ASC
ReceiverDetails表
Fields ReceivID | LONG Receiver_Location_Description | TEXT -- NULL OK Beginning | DateTime --no partial seconds Ending | DateTime --no partial seconds Lat | DOUBLE Lon | DOUBLE Keys and Indicies PK_RID | Primary,Simple ReceivID | ASC
ValidXmitters表
Field (and primary key) XmitID | TEXT(25) -- primary,unique,no null,simple
sql小提琴……
…这样您就可以使用表定义和代码
这个问题适用于MSAccess,但正如Phrancis指出的那样,Access没有sql小提琴样式.所以,您应该能够go here查看基于Phrancis’ answer的表定义和代码:
http://sqlfiddle.com/#!6/e9942/4 (external link)
加入:开始
我目前的“内心胆量”加入战略
首先创建一个带有列顺序的FirstTable_rekeyed和复合主键(RecTStamp,ReceivID,XmitID)所有索引/排序的ASC.我还分别在每列上创建了索引.然后像这样填充它.
INSERT INTO FirstTable_rekeyed (RecTStamp,XmitID) SELECT DISTINCT ROW RecTStamp,XmitID FROM FirstTable WHERE XmitID IN (SELECT XmitID from ValidXmitters) ORDER BY RecTStamp,XmitID;
上面的查询用153006条记录填充新表,并在10秒左右的时间内返回.
当使用TOP 1子查询方法将整个方法包装在“SELECT Count(*)FROM(…)”中时,以下内容在一秒或两秒内完成
SELECT ReceiverRecord.RecTStamp,ReceiverRecord.ReceivID,ReceiverRecord.XmitID,(SELECT TOP 1 XmitGPS.X_ID FROM SecondTable as XmitGPS WHERE ReceiverRecord.RecTStamp < XmitGPS.XTStamp ORDER BY XmitGPS.X_ID) AS AfterXmit_ID FROM FirstTable_rekeyed AS ReceiverRecord -- INNER JOIN SecondTable AS XmitGPS ON (ReceiverRecord.RecTStamp < XmitGPS.XTStamp) GROUP BY RecTStamp,XmitID; -- No separate join needed for the Top 1 method,but it would be required for the other methods. -- Additionally no restriction of the returned set is needed if I create the _rekeyed table. -- May not need GROUP BY either. Could try ORDER BY. -- The three AfterXmit_ID alternatives below take longer than 3 minutes to complete (or do not ever complete). -- FIRST(XmitGPS.X_ID) -- MIN(XmitGPS.X_ID) -- MIN(SWITCH(XmitGPS.XTStamp > ReceiverRecord.RecTStamp,XmitGPS.X_ID,Null))
以前的“内部胆量”JOIN查询
首先(快……但不够好)
SELECT A.RecTStamp,A.ReceivID,A.XmitID,MAX(IIF(B.XTStamp<= A.RecTStamp,B.XTStamp,Null)) as BeforeXTStamp,MIN(IIF(B.XTStamp > A.RecTStamp,Null)) as AfterXTStamp FROM FirstTable as A INNER JOIN SecondTable as B ON (A.RecTStamp<>B.XTStamp OR A.RecTStamp=B.XTStamp) GROUP BY A.RecTStamp,A.XmitID -- alternative for BeforeXTStamp MAX(-(B.XTStamp<=A.RecTStamp)*B.XTStamp) -- alternatives for AfterXTStamp (see "Aside" note below) -- 1.0/(MAX(1.0/(-(B.XTStamp>A.RecTStamp)*B.XTStamp))) -- -1.0/(MIN(1.0/((B.XTStamp>A.RecTStamp)*B.XTStamp)))
第二(慢)
SELECT A.RecTStamp,AbyB1.XTStamp AS BeforeXTStamp,AbyB2.XTStamp AS AfterXTStamp FROM (FirstTable AS A INNER JOIN (select top 1 B1.XTStamp,A1.RecTStamp from SecondTable as B1,FirstTable as A1 where B1.XTStamp<=A1.RecTStamp order by B1.XTStamp DESC) AS AbyB1 --MAX (time points before) ON A.RecTStamp = AbyB1.RecTStamp) INNER JOIN (select top 1 B2.XTStamp,A2.RecTStamp from SecondTable as B2,FirstTable as A2 where B2.XTStamp>A2.RecTStamp order by B2.XTStamp ASC) AS AbyB2 --MIN (time points after) ON A.RecTStamp = AbyB2.RecTStamp;
背景
我有一个远程测量表(别名为A),其中包含一个基于DateTime标记,发送器ID和记录设备ID的复合主键.由于我无法控制的情况,我的sql语言是Microsoft Access中的标准Jet数据库(用户将使用2007及更高版本).由于Transmitter ID,这些条目中只有大约200,000个与查询相关.
存在第二遥测表(别名B),其涉及具有单个DateTime主键的大约50,000个条目
第一步,我专注于从第二个表中找到第一个表中邮票的最接近的时间戳.
加入结果
我发现的怪癖……
……在调试期间一路走来
将JOIN逻辑作为FROM FirstTable写为A INNER JOIN SecondTable作为B ON(A.RecTStamp<> B.XTStamp OR A.RecTStamp = B.XTStamp)感觉真的很奇怪,正如@byrdzeye在评论中所指出的那样已经消失了)是一种交叉连接的形式.请注意,在上面的代码中用LEFT OUTER JOIN替换INNER JOIN似乎对返回的行的数量或标识没有影响.我似乎也无法放弃ON条款或者说ON(1 = 1).只使用逗号加入(而不是INNER或LEFT OUTER JOIN)会导致在此查询中返回的Count(select * from A)* Count(select * from B)行,而不是每个表A只有一行,如( A<> B OR A = B)显式JOIN返回.这显然不合适.在给定复合主键类型的情况下,FIRST似乎无法使用.
第二种JOIN风格虽然可以说更清晰,但却更慢.这可能是因为较大的表以及两个选项中找到的两个CROSS JOIN需要额外的两个内部JOIN.
除此之外:用MIN / MAX替换IIF子句似乎返回相同数量的条目.
MAX( – (B.XTStamp< = A.RecTStamp)* B.XTStamp)
适用于“之前”(MAX)时间戳,但不能直接用于“之后”(MIN),如下所示:
MIN( – (B.XTStamp> A.RecTStamp)* B.XTStamp)
因为FALSE条件的最小值始终为0.此0小于任何post-epoch DOUBLE(DateTime字段是Access的子集,并且此计算将字段转换为). IIF和MIN / MAX方法为AfterXTStamp值建议的替代方法有效,因为除零(FALSE)会生成空值,其中MIN和MAX的聚合函数会跳过.
下一步
更进一步,我希望在第二个表中找到第一个表中直接位于第一个表中时间戳的时间戳,并根据到这些点的时间距离对第二个表中的数据值进行线性插值(即,如果时间戳来自第一个表是“之前”和“之后”之间的25%,我希望25%的计算值来自与“之后”点相关的第二个表值数据和来自“之前”的75% ).使用修改后的连接类型作为内部胆量的一部分,并在下面的建议答案之后生成…
SELECT AvgGPS.XmitID,StrDateIso8601Msec(AvgGPS.RecTStamp) AS RecTStamp_ms,-- StrDateIso8601MSec is a VBA function returning a TEXT string in yyyy-mm-dd hh:nn:ss.lll format AvgGPS.ReceivID,RD.Receiver_Location_Description,RD.Lat AS Receiver_Lat,RD.Lon AS Receiver_Lon,AvgGPS.Before_Lat * (1 - AvgGPS.AfterWeight) + AvgGPS.After_Lat * AvgGPS.AfterWeight AS Xmit_Lat,AvgGPS.Before_Lon * (1 - AvgGPS.AfterWeight) + AvgGPS.After_Lon * AvgGPS.AfterWeight AS Xmit_Lon,AvgGPS.RecTStamp AS RecTStamp_basic FROM ( SELECT AfterTimestampID.RecTStamp,AfterTimestampID.XmitID,AfterTimestampID.ReceivID,GPSBefore.BeforeXTStamp,GPSBefore.Latitude AS Before_Lat,GPSBefore.Longitude AS Before_Lon,GPSAfter.AfterXTStamp,GPSAfter.Latitude AS After_Lat,GPSAfter.Longitude AS After_Lon,( (AfterTimestampID.RecTStamp - GPSBefore.XTStamp) / (GPSAfter.XTStamp - GPSBefore.XTStamp) ) AS AfterWeight FROM ( (SELECT ReceiverRecord.RecTStamp,(SELECT TOP 1 XmitGPS.X_ID FROM SecondTable as XmitGPS WHERE ReceiverRecord.RecTStamp < XmitGPS.XTStamp ORDER BY XmitGPS.X_ID) AS AfterXmit_ID FROM FirstTable AS ReceiverRecord -- WHERE ReceiverRecord.XmitID IN (select XmitID from ValidXmitters) GROUP BY RecTStamp,XmitID ) AS AfterTimestampID INNER JOIN SecondTable AS GPSAfter ON AfterTimestampID.AfterXmit_ID = GPSAfter.X_ID ) INNER JOIN SecondTable AS GPSBefore ON AfterTimestampID.AfterXmit_ID = GPSBefore.X_ID + 1 ) AS AvgGPS INNER JOIN ReceiverDetails AS RD ON (AvgGPS.ReceivID = RD.ReceivID) AND (AvgGPS.RecTStamp BETWEEN RD.Beginning AND RD.Ending) ORDER BY AvgGPS.RecTStamp,AvgGPS.ReceivID;
…返回152928条记录,符合(至少大约)最终预期记录数.我的i7-4790,16GB RAM,无SSD,Win 8.1 Pro系统的运行时间大概是5-10分钟.
参考文献1:MS Access Can Handle Millisecond Time Values–Really和 accompanying source file [08080011.txt]
解决方法
首先加入
您的IIF字段选择可能会因使用Switch statement而受益.有时情况似乎是这样,特别是对于事物sql,当在SELECT的主体中进行简单的比较时,SWITCH(在典型的sql中通常称为CASE)非常快.您的情况下的语法几乎相同,但可以扩展一个开关以覆盖一个字段中的大量比较.需要考虑的事情.
SWITCH ( expr1,val1,expr2,val2,val3 -- default value or "else" )
在较大的陈述中,开关也可以帮助提高可读性.在上下文中:
MAX(SWITCH(B.XTStamp <= A.RecTStamp,--alternatively MAX(-(B.XTStamp<=A.RecTStamp)*B.XTStamp) as BeforeXTStamp,MIN(SWITCH(B.XTStamp>A.RecTStamp,Null)) as AfterXTStamp
至于连接本身,我认为(A.RecTStamp<> B.XTStamp OR A.RecTStamp = B.XTStamp)与你想要做的一样好.它不是那么快,但我不指望它也是如此.
第二次加入
你说这个慢了.从代码的角度来看,它的可读性也较低.考虑到1到2之间同样令人满意的结果集,我会说1.至少你很明显你想要这样做.子查询通常不是很快(虽然通常是不可避免的),特别是在这种情况下,你在每个子系统中都会额外加入,这必然会使执行计划复杂化.
一句话,我看到你使用旧的ANSI-89连接语法.最好避免这种情况,使用更现代的连接语法,性能会相同或更好,并且它们不那么模糊或更容易阅读,更难以犯错误.
FROM (FirstTable AS A INNER JOIN (select top 1 B1.XTStamp,A1.RecTStamp from SecondTable as B1 inner join FirstTable as A1 on B1.XTStamp <= A1.RecTStamp order by B1.XTStamp DESC) AS AbyB1 --MAX (time points before)
命名的东西
我认为你的东西被命名的方式充其量只是无益的,最糟糕的是神秘的. A,B,A1,B1等作为表别名我觉得可能会更好.另外,我认为字段名称不是很好,但我意识到你可能无法控制它.我将快速引用The Codeless Code关于命名事物的主题,并将其留在那……
“Invective!” answered the priestess. “Verb your expletive nouns!”
“后续步骤”查询
我无法理解它是如何编写的,我不得不将它带到文本编辑器并进行一些样式更改以使其更具可读性.我知道Access的sql编辑器超出了笨重,因此我通常在一个好的编辑器中编写我的查询,如Notepad或Sublime Text.我应用的一些风格变化使其更具可读性:
> 4个空格缩进而不是2个空格
>数学和比较运算符周围的空间
>更自然地放置括号和缩进(我使用Java式支架,但也可以是C风格,根据您的喜好)
事实证明,这确实是一个非常复杂的查询.为了理解它,我必须从最里面的查询开始,你的ID数据集,我理解的与你的第一次加入相同.它返回您感兴趣的设备子集中最接近前/后时间戳的设备的ID和时间戳.因此,为什么不将ID称为ClosestTimestampID而不是ID.
您的Det联接仅使用一次:
其余的时间,它只加入ClosestTimestampID中已有的值.所以我们应该能够做到这一点:
) AS ClosestTimestampID INNER JOIN SecondTable AS TL1 ON ClosestTimestampID.BeforeXTStamp = TL1.XTStamp) INNER JOIN SecondTable AS TL2 ON ClosestTimestampID.AfterXTStamp = TL2.XTStamp WHERE ClosestTimestampID.XmitID IN (<limited subset S>)
也许不是一个巨大的性能提升,但我们可以做的任何事情来帮助可怜的Jet数据库优化器将有所帮助!
我不能动摇你用于插值的BeforeWeight和AfterWeight的计算/算法可以做得更好的感觉,但不幸的是我对这些并不是很好.
避免崩溃的一个建议(虽然根据您的应用程序而不理想)将是将嵌套子查询分解为自己的表并在需要时更新它们.我不确定您需要刷新源数据的频率,但如果不经常,您可能会考虑编写一些VBA代码来安排表和派生表的更新,并将最外层的查询留下来从那些表而不是原始来源.只是一个想法,就像我说的不理想,但鉴于工具,你可能没有选择.
一切都在一起:
SELECT InGPS.XmitID,StrDateIso8601Msec(InGPS.RecTStamp) AS RecTStamp_ms,-- StrDateIso8601MSec is a VBA function returning a TEXT string in yyyy-mm-dd hh:nn:ss.lll format InGPS.ReceivID,InGPS.Before_Lat * InGPS.BeforeWeight + InGPS.After_Lat * InGPS.AfterWeight AS Xmit_Lat,InGPS.Before_Lon * InGPS.BeforeWeight + InGPS.After_Lon * InGPS.AfterWeight AS Xmit_Lon,InGPS.RecTStamp AS RecTStamp_basic FROM ( SELECT ClosestTimestampID.RecTStamp,ClosestTimestampID.XmitID,ClosestTimestampID.ReceivID,ClosestTimestampID.BeforeXTStamp,TL1.Latitude AS Before_Lat,TL1.Longitude AS Before_Lon,(1 - ((ClosestTimestampID.RecTStamp - ClosestTimestampID.BeforeXTStamp) / (ClosestTimestampID.AfterXTStamp - ClosestTimestampID.BeforeXTStamp))) AS BeforeWeight,ClosestTimestampID.AfterXTStamp,TL2.Latitude AS After_Lat,TL2.Longitude AS After_Lon,( (ClosestTimestampID.RecTStamp - ClosestTimestampID.BeforeXTStamp) / (ClosestTimestampID.AfterXTStamp - ClosestTimestampID.BeforeXTStamp)) AS AfterWeight FROM ((( SELECT A.RecTStamp,MAX(SWITCH(B.XTStamp <= A.RecTStamp,Null)) AS BeforeXTStamp,MIN(SWITCH(B.XTStamp > A.RecTStamp,Null)) AS AfterXTStamp FROM FirstTable AS A INNER JOIN SecondTable AS B ON (A.RecTStamp <> B.XTStamp OR A.RecTStamp = B.XTStamp) WHERE A.XmitID IN (<limited subset S>) GROUP BY A.RecTStamp,XmitID ) AS ClosestTimestampID INNER JOIN FirstTable AS Det ON (Det.XmitID = ClosestTimestampID.XmitID) AND (Det.ReceivID = ClosestTimestampID.ReceivID) AND (Det.RecTStamp = ClosestTimestampID.RecTStamp)) INNER JOIN SecondTable AS TL1 ON ClosestTimestampID.BeforeXTStamp = TL1.XTStamp) INNER JOIN SecondTable AS TL2 ON ClosestTimestampID.AfterXTStamp = TL2.XTStamp WHERE Det.XmitID IN (<limited subset S>) ) AS InGPS INNER JOIN ReceiverDetails AS RD ON (InGPS.ReceivID = RD.ReceivID) AND (InGPS.RecTStamp BETWEEN <valid parameters from another table>) ORDER BY StrDateIso8601Msec(InGPS.RecTStamp),InGPS.ReceivID;