我的公司被共生的伙伴关系诅咒变成了寄生虫.要从寄生虫获取我们的数据,我们必须使用一个缓慢的odbc连接.我最近注意到,我可以通过并行运行查询来获得更多的吞吐量(即使在同一张表上).
有一个特别大的表,我想从中提取数据并将其移动到本地表中.并行运行查询可以更快地获取数据,但我也可以想象,这可能会导致尝试将数据从多个查询一次写入同一个表格的问题.
你可以告诉我如何最好地处理这种情况,以便我可以利用并行增加使用查询的速度?
编辑:我在这里得到了一些很好的反馈,但是我觉得我并不完全清楚,我通过链接服务器(使用odbc驱动程序)来提取数据.换句话说,这意味着我可以运行正常的INSERT语句,我相信这将提供比sqlBulkCopy或BULK INSERT更好的性能(实际上,我不认为BULK INSERT甚至是一个选项).
解决方法
你读过
Load 1TB in less than 1 hour吗
- Run as many load processes as you have available cpus. If you have
32 cpus,run 32 parallel loads. If you have 8 cpus,run 8 parallel
loads.- If you have control over the creation of your input files,make them
of a size that is evenly divisible by the number of load threads you
want to run in parallel. Also make sure all records belong to one
partition if you want to use the switch partition strategy.- Use BULK insert instead of BCP if you are running the process on the
sql Server machine.- Use table partitioning to gain another 8-10%,but only if your input
files are GUARANTEED to match your partitioning function,meaning
that all records in one file must be in the same partition.- Use TABLOCK to avoid row at a time locking.
- Use ROWS PER BATCH = 2500,or something near this if you are
importing multiple streams into one table.
对于sql Server 2008,某些情况下您可以使用minimal logging for a standard INSERT SELECT:
sql Server 2008 enhances the methods that it can handle with minimal logging. It supports minimally logged regular INSERT SELECT statements. In addition,turning on trace flag 610 lets sql Server 2008 support minimal logging against a nonempty B-tree for new key ranges that cause allocations of new pages.