我可以通过让一个线程在循环中调用poll_one直到它返回零来轻松复制此问题.这可能会使线程调用运行停留在pthread_cond_wait,而调用poll_one的线程会突破循环.据推测,io_service期望该线程在epoll_wait中返回块,但它没有义务这样做,并且期望似乎是致命的.
是否要求线程与io_services静态关联?
这是一个显示死锁的示例.这是处理此io_service的唯一线程,因为其他人已经继续.肯定有套接字操作待定:
#0 pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 boost::asio::detail::posix_event::wait<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> > (...) at /usr/include/boost/asio/detail/posix_event.hpp:80 #2 boost::asio::detail::task_io_service::do_run_one (...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:405 #3 boost::asio::detail::task_io_service::run (...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:146
我相信错误如下:如果服务于I / O队列的线程是阻塞I / O套接字就绪检查并且调用调度函数的线程,如果在io服务上阻塞了任何其他线程,它必须发出信号.它目前仅表示当时是否有准备好运行的处理程序.但是没有线程检查套接字准备情况.
解决方法
booost/asio/detail/impl/task_io_service.ipp
中修改后的task_io_service :: do_poll_one()的片段.添加的唯一行是sleep.
std::size_t task_io_service::do_poll_one(mutex::scoped_lock& lock,task_io_service::thread_info& this_thread,const boost::system::error_code& ec) { if (stopped_) return 0; operation* o = op_queue_.front(); if (o == &task_operation_) { op_queue_.pop(); lock.unlock(); { task_cleanup c = { this,&lock,&this_thread }; (void)c; // Run the task. May throw an exception. Only block if the operation // queue is empty and we're not polling,otherwise we want to return // as soon as possible. task_->run(false,this_thread.private_op_queue); boost::this_thread::sleep_for(boost::chrono::seconds(3)); } o = op_queue_.front(); if (o == &task_operation_) return 0; } ...
我的测试驱动程序非常基础:
>通过计时器进行异步工作循环,打印“.”每3秒钟一次.
>生成一个将轮询io_service的线程.
>延迟允许新线程时间轮询io_service,并且当poll线程在task_io_service :: do_poll_one()中休眠时,主调用io_service :: run().
测试代码:
#include <iostream> #include <boost/asio/io_service.hpp> #include <boost/asio/steady_timer.hpp> #include <boost/chrono.hpp> #include <boost/thread.hpp> boost::asio::io_service io_service; boost::asio::steady_timer timer(io_service); void arm_timer() { std::cout << "."; std::cout.flush(); timer.expires_from_now(boost::chrono::seconds(3)); timer.async_wait(boost::bind(&arm_timer)); } int main() { // Add asynchronous work loop. arm_timer(); // Spawn poll thread. boost::thread poll_thread( boost::bind(&boost::asio::io_service::poll,boost::ref(io_service))); // Give time for poll thread service reactor. boost::this_thread::sleep_for(boost::chrono::seconds(1)); io_service.run(); }
调试:
[twsansbury@localhost bug]$gdb a.out ... (gdb) r Starting program: /home/twsansbury/dev/bug/a.out [Thread debugging using libthread_db enabled] .[New Thread 0xb7feeb90 (LWP 31892)] [Thread 0xb7feeb90 (LWP 31892) exited]
此时,arm_timer()已打印“.”曾经(当它被武装起来时). poll线程以非阻塞方式为反应器提供服务,并且在op_queue_为空时睡眠3秒(当task_cleanup c退出范围时,task_operation_将被添加回op_queue_).当op_queue_为空时,主线程调用io_service :: run(),看到op_queue_为空,并使自己成为first_idle_thread_,它在wakeup_event上等待. poll线程完成休眠,并返回0,主线程等待wakeup_event.
等待10秒后,arm_timer()有足够的时间准备就绪,我打断调试器:
Program received signal SIGINT,Interrupt. 0x00919402 in __kernel_vsyscall () (gdb) bt #0 0x00919402 in __kernel_vsyscall () #1 0x0081bbc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x00763b3d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x08059dc2 in void boost::asio::detail::posix_event::wait >(boost::asio::detail::scoped_lock&) () #4 0x0805a009 in boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock&,boost::asio::detail::task_io_service_thread_info&,boost::system::error_code const&) () #5 0x0805a11c in boost::asio::detail::task_io_service::run(boost::system::error_code&) () #6 0x0805a1e2 in boost::asio::io_service::run() () #7 0x0804db78 in main ()
并排时间表如下:
poll thread | main thread ---------------------------------------+--------------------------------------- lock() | do_poll_one() | |-- pop task_operation_ from | | queue_op_ | |-- unlock() | lock() |-- create task_cleanup | do_run_one() |-- service reactor (non-block) | `-- queue_op_ is empty |-- ~task_cleanup() | |-- set thread as idle | |-- lock() | `-- unlock() | `-- queue_op_.push( | | task_operation_) | `-- task_operation_ is | queue_op_.front() | `-- return 0 | // still waiting on wakeup_event unlock() |
尽我所知,修补没有副作用:
if (o == &task_operation_) return 0;
至:
if (o == &task_operation_) { if (!one_thread_) wake_one_thread_and_unlock(lock); return 0; }
无论如何,我已经提交了bug and fix.考虑留意官方回复的机票.