简单的多线程c 11程序,其中所有线程在紧循环中锁定相同的互斥体.
当它使用8个线程(作为逻辑cpus的数量),它可以达到500万锁/秒
编辑:
根据g 4.8.2(ubuntu x64):即使有100个线程,甚至没有性能下降! (和两倍以上的表现,但这是另一个故事)
– 所以这似乎是VC互斥体实现的具体问题
#include <chrono> #include <thread> #include <memory> #include <mutex> #include <atomic> #include <sstream> #include <iostream> using namespace std::chrono; void thread_loop(std::mutex* mutex,std::atomic_uint64_t* counter) { while (true) { std::unique_lock<std::mutex> ul(*mutex); counter->operator++(); } } int _tmain(int argc,_TCHAR* argv[]) { int threads = 9; std::mutex mutex; std::atomic_uint64_t counter = 0; std::cout << "Starting " << threads << " threads.." << std::endl; for (int i = 0; i < threads; ++i) new std::thread(&thread_loop,&mutex,&counter); std::cout << "Started " << threads << " threads.." << std::endl; while (1) { counter = 0; std::this_thread::sleep_for(seconds(1)); std::cout << "Counter = " << counter.load() << std::endl; } }
VS 2013分析器告诉我,大部分时间(95.7%)被浪费在一个紧缩的循环中(rtlocks.cpp中的第697行):
while (IsBlocked() & & spinWait._SpinOnce()) { //_YieldProcessor is called inside _SpinOnce }
可能是什么原因?怎么可以改善?
操作系统:windows 7 x64
cpu:i7 3770 4芯(x2超线程)