我已经将问题简化并解决了一个小的测试应用程序:
实体:
public class Tag { public virtual string Id { get; set; } }
图像,包括Id(int),Name(字符串)和标签(多对多,Tag实例集)
public class Image { private Iesi.Collections.Generic.ISet<Tag> tags = new HashedSet<Tag>(); public virtual int Id { get; set; } public virtual string Name { get; set; } public virtual IEnumerable<Tag> Tags { get { return tags; } } public virtual void AddTag(Tag tag) { tags.Add(tag); } }
我使用“按代码映射”使用以下映射:
public class TagMapping : ClassMapping<Tag> { public TagMapping() { Id(x => x.Id,map => map.Generator(Generators.Assigned)); } } public class ImageMapping : ClassMapping<Image> { public ImageMapping() { Id(x => x.Id,map => map.Generator(Generators.Native)); Property(x => x.Name); Set(x => x.Tags,map => map.Access(Accessor.Field),map => map.ManyToMany(m2m => { })); } }
NHibernate /数据库配置如下所示:
<hibernate-configuration xmlns="urn:nhibernate-configuration-2.2"> <session-factory> <property name="dialect">NHibernate.Dialect.Mssql2008Dialect</property> <property name="connection.connection_string_name">PrimaryDatabase</property> <property name="format_sql">true</property> </session-factory> </hibernate-configuration> <connectionStrings> <add name="PrimaryDatabase" providerName="System.Data.sqlClient" connectionString="Data Source=.\sqlEXPRESS;Initial Catalog=PerfTest;Integrated Security=True" /> </connectionStrings>
我想实现以下查询:给我所有图像,其中名称包含特定字符串或任何标记包含特定字符串.为了找到后者,我使用了一个子查询,它给了我带有匹配标签的所有图像的ID.因此,最终搜索条件是:图像具有包含特定字符串的名称,或者其ID是子查询返回的ID之一.
var term = "abc"; var mode = MatchMode.Anywhere; var imagesWithMatchingTag = QueryOver.Of<Image>() .JoinQueryOver<Tag>(x => x.Tags) .WhereRestrictionOn(x => x.Id).IsLike(term,mode) .Select(x => x.Id); var qry = session.QueryOver<Image>() .Where( Restrictions.On<Image>(x => x.Name).IsLike(term,mode) || Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag)) .List();
我运行此查询的测试数据库(DBMS:sql Server 2008 Express R2)是专门为此测试创建的,不包含任何其他内容.我用随机数据填充:10.000张图像(表格图像),4.000张标签(表格标签)和图像与标签之间大约200,000个关联(表格标签),即.每个图像有大约20个相关标签.数据库
sql NHibernate声称使用的是:
SELECT this_.Id as Id1_0_,this_.Name as Name1_0_ FROM Image this_ WHERE ( this_.Name like @p0 or this_.Id in ( SELECT this_0_.Id as y0_ FROM Image this_0_ inner join Tags tags3_ on this_0_.Id=tags3_.image_key inner join Tag tag1_ on tags3_.elt=tag1_.Id WHERE tag1_.Id like @p1 ) ); @p0 = '%abc%' [Type: String (4000)],@p1 = '%abc%' [Type: String (4000)]
鉴于我正在创建的查询,这看起来很合理.
如果我使用NHibernate运行此查询,则查询大约需要30秒(NHibernate.AdoNet.AbstractBatcher – ExecuteReader需要32964毫秒)并返回98个实体.
但是,如果我直接在sql Server Management studio中执行等效查询:
DECLARE @p0 nvarchar(4000) DECLARE @p1 nvarchar(4000) SET @p0 = '%abc%' SET @p1 = '%abc%' SELECT this_.Id as Id1_0_,this_.Name as Name1_0_ FROM Image this_ WHERE ( this_.Name like @p0 or this_.Id in ( SELECT this_0_.Id as y0_ FROM Image this_0_ inner join Tags tags3_ on this_0_.Id=tags3_.image_key inner join Tag tag1_ on tags3_.elt=tag1_.Id WHERE tag1_.Id like @p1 ) );
查询花费不到一秒钟(并返回98个结果).
进一步实验:
var qry = session.QueryOver<Image>() .Where( Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag)) .List();
要么
var qry = session.QueryOver<Image>() .Where(Restrictions.On<Image>(x => x.Name).IsLike(term,mode)) .List();
查询很快.
如果我在子查询中不使用like但完全匹配:
var imagesWithMatchingTag = QueryOver.Of<Image>() .JoinQueryOver<Tag>(x => x.Tags) .Where(x => x.Id == term) .Select(x => x.Id);
查询也很快.
当我调试程序并在查询执行时暂停时,托管调用堆栈的顶部如下所示:
[Managed to Native Transition] System.Data.dll!SNINativeMethodWrapper.SNIReadSync(System.Runtime.InteropServices.SafeHandle pConn,ref System.IntPtr packet,int timeout) + 0x53 bytes System.Data.dll!System.Data.sqlClient.TdsParserStateObject.ReadSni(System.Data.Common.DbAsyncResult asyncResult,System.Data.sqlClient.TdsParserStateObject stateObj) + 0xa3 bytes System.Data.dll!System.Data.sqlClient.TdsParserStateObject.ReadNetworkPacket() + 0x24 bytes System.Data.dll!System.Data.sqlClient.TdsParserStateObject.ReadBuffer() + 0x1f bytes System.Data.dll!System.Data.sqlClient.TdsParserStateObject.ReadByte() + 0x46 bytes System.Data.dll!System.Data.sqlClient.TdsParser.Run(System.Data.sqlClient.RunBehavior runBehavior,System.Data.sqlClient.sqlCommand cmdHandler,System.Data.sqlClient.sqlDataReader dataStream,System.Data.sqlClient.BulkCopySimpleResultSet bulkCopyHandler,System.Data.sqlClient.TdsParserStateObject stateObj) + 0x67 bytes System.Data.dll!System.Data.sqlClient.sqlDataReader.ConsumeMetaData() + 0x22 bytes System.Data.dll!System.Data.sqlClient.sqlDataReader.MetaData.get() + 0x57 bytes System.Data.dll!System.Data.sqlClient.sqlCommand.FinishExecuteReader(System.Data.sqlClient.sqlDataReader ds,System.Data.sqlClient.RunBehavior runBehavior,string resetOptionsString) + 0xe1 bytes ...
所以,我的问题是:
>为什么NHibernate执行查询的时间要长得多,即使使用的sql是相同的?
>我怎样才能摆脱这种差异?是否有可能导致此行为的设置?
我知道查询一般不是世界上最有效的东西,但是我在这里引人注目的是使用NHibernate和manualy查询之间的区别.这里确实存在一些奇怪的事情.
很抱歉这篇文章很长,但我希望尽可能多地包含这个问题.非常感谢您的帮助!
更新1:
我已经使用NHProf测试了应用程序而没有太多附加值:NHProf显示执行的sql是
SELECT this_.Id as Id1_0_,this_.Name as Name1_0_ FROM Image this_ WHERE (this_.Name like '%abc%' /* @p0 */ or this_.Id in (SELECT this_0_.Id as y0_ FROM Image this_0_ inner join Tags tags3_ on this_0_.Id = tags3_.image_key inner join Tag tag1_ on tags3_.elt = tag1_.Id WHERE tag1_.Id like '%abc%' /* @p1 */))
这正是我之前发布的内容(因为这是NHibernate首先写入日志的内容).
这是NHProf的截图
警告是可以理解的,但不解释行为.
更新2
@surfen首先将子查询的结果从数据库中提取出来并将它们重新放回主查询中:
var imagesWithMatchingTag = QueryOver.Of<Image>() .JoinQueryOver<Tag>(x => x.Tags) .WhereRestrictionOn(x => x.Id).IsLike(term,mode) .Select(x => x.Id); var ids = imagesWithMatchingTag.GetExecutableQueryOver(session).List<int>().ToArray(); var qry = session.QueryOver<Image>() .Where( Restrictions.On<Image>(x => x.Name).IsLike(term,mode) || Restrictions.On<Image>(x => x.Id).IsIn(ids)) .List();
虽然这确实使主查询再次快速,但我宁愿不采用这种方法,因为它不适合现实世界应用程序中的预期用法.有趣的是,这要快得多.我希望子查询方法同样快,因为它不依赖于外部查询.
更新3
这似乎与NHibernate无关.如果我使用普通的ADO.NET对象运行查询,我会得到相同的行为:
var cmdText = @"SELECT this_.Id as Id1_0_,this_.Name as Name1_0_ FROM Image this_ WHERE (this_.Name like @p0 or this_.Id in (SELECT this_0_.Id as y0_ FROM Image this_0_ inner join Tags tags3_ on this_0_.Id = tags3_.image_key inner join Tag tag1_ on tags3_.elt = tag1_.Id WHERE tag1_.Id like @p1 ));"; using (var con = new sqlConnection(ConfigurationManager.ConnectionStrings["PrimaryDatabase"].ConnectionString)) { con.Open(); using (var txn = con.BeginTransaction()) { using (var cmd = new sqlCommand(cmdText,con,txn)) { cmd.CommandTimeout = 120; cmd.Parameters.AddWithValue("p0","%abc%"); cmd.Parameters.AddWithValue("p1","%abc%"); using (var reader = cmd.ExecuteReader()) { while (reader.Read()) { Console.WriteLine("Match"); } } } txn.Commit(); } }
更新4
查询计划(点击放大):
慢查询
确切地说,计划有所不同.
更新5
因为sql Server确实认为子查询是相关的,所以我尝试了不同的东西:我将与名称相关的标准单独移动到子查询:
var term = "abc"; var mode = MatchMode.Anywhere; var imagesWithMatchingTag = QueryOver.Of<Image>() .JoinQueryOver<Tag>(x => x.Tags) .WhereRestrictionOn(x => x.Id).IsLike(term,mode) .Select(x => x.Id); var imagesWithMatchingName = QueryOver.Of<Image>() .WhereRestrictionOn(x => x.Name).IsLike(term,mode) .Select(x => x.Id); var qry = session.QueryOver<Image>() .Where( Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingName) || Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag) ).List();
SELECT this_.Id as Id1_0_,this_.Name as Name1_0_ FROM Image this_ WHERE ( this_.Id in ( SELECT this_0_.Id as y0_ FROM Image this_0_ inner join Tags tags3_ on this_0_.Id=tags3_.image_key inner join Tag tag1_ on tags3_.elt=tag1_.Id WHERE tag1_.Id like @p0 ) or this_.Id in ( SELECT this_0_.Id as y0_ FROM Image this_0_ WHERE this_0_.Name like @p1 ) ); @p0 = '%abc%' [Type: String (4000)],@p1 = '%abc%' [Type: String (4000)]
这似乎打破了相关性,结果查询再次“快速”(“快速”,如“当下可接受”).查询时间从30秒下降到~170ms.仍然不是轻量级查询,但至少可以让我从这里继续.我知道“喜欢’%foo%’”永远不会超级快.如果遇到最糟糕的情况,我仍然可以转移到专门的搜索服务器(Lucene,solr)或真正的全文搜索.
var qry = session.QueryOver(() => img) .Left.JoinQueryOver(x => x.Tags,() => tag) .Where( Restrictions.Like(Projections.Property(() => img.Name),term,mode) || Restrictions.Like(Projections.Property(() => tag.Id),mode)) .TransformUsing(Transformers.DistinctRootEntity) .List();
sql:
SELECT this_.Id as Id1_1_,this_.Name as Name1_1_,tags3_.image_key as image1_3_,tag1_.Id as elt3_,tag1_.Id as Id0_0_ FROM Image this_ left outer join Tags tags3_ on this_.Id=tags3_.image_key left outer join Tag tag1_ on tags3_.elt=tag1_.Id WHERE ( this_.Name like @p0 or tag1_.Id like @p1 ); @p0 = '%abc%' [Type: String (4000)],@p1 = '%abc%' [Type: String (4000)]
解决方法
var qry = session.QueryOver<Image>() .Where( Restrictions.On<Image>(x => x.Name).IsLike(term,mode) || Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag)) .List();
您仅为第一个查询提供了sql.那第二个怎么样?你在sql Management Studio下测试了吗?使用sql Server Profiler作为@JoachimIsaksson建议找出NHibernate执行服务器端的查询.
这看起来就像是将97个图像对象加载到内存中.它们各有多大?
编辑
另一个赌注是您的第一个查询执行第二个查询的广告内部查询.尝试在第一个查询上执行.List()以将标记加载到内存中.
编辑2
从查询计划看,您的查询看起来像是Correlated subquery.
您提到这些查询很快:
var qry = session.QueryOver<Image>() .Where( Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag)) .List();
要么
var qry = session.QueryOver<Image>() .Where(Restrictions.On<Image>(x => x.Name).IsLike(term,mode)) .List();
只需UNION它们就可以获得与单独运行它们相同的结果.
还要确保所有连接列都有索引.
这就是IS IN(查询)的问题 – 您无法确定数据库是如何执行它的(除非您以某种方式强迫它使用某个计划).也许你可以改变.In()它以某种方式变成JoinQueryOver()?