我的Rails 4.1应用程序使用Sidekiq在赛璐id演员中运行测量.它连续用尽数据库连接.看来Sidekiq每个工作打开2个连接,旧的连接永远不会被正确关闭.
系统做什么
>每15分钟,我启动一个MeasurementWorker.调用MeasurementWorker.perform_async(measurement.id).这样做:
class MeasurementWorker include Sidekiq::Worker sidekiq_options retry: false,backtrace: true def perform(measurement_id,force = false) ActiveRecord::Base.connection_pool.with_connection do Measurement.find(measurement_id).run end end end
在这个测量中,当我调用.run的时候,这样做:
# varIoUs checks if measurement can be run at all,using AR # ... begin ActiveRecord::Base.connection_pool.with_connection do # (I used to have a Timeout.timeout here,but removed it for the # sake of simplification) @connection = MeasurementConnection.new do |event_info| event = Event.new event.assign_attributes(event_info) event.save end while @connection.measuring?; end end # with_connection rescue Exception => e # an exception happened,e.g. something during the measurement itself # log error (left out here for brevity) else # all went fine,# save this measurement via AR ensure # Close and terminate the actor to free up the websocket,# if it is still actively measuring something. if @connection if @connection.alive? and @connection.measuring? @connection.close end while @connection.alive? @connection.terminate sleep(0.01) end end end
> MeasurementConnection是一个简单的赛璐id演员.这个演员中没有AR相关的代码.
组态
> Unicorn:并发设置为3
> Sidekiq 3.3.0:并发设置为50,并在初始化器中:
Sidekiq.configure_server do |config| if defined?(ActiveRecord::Base) config = Rails.application.config.database_configuration[Rails.env] config['pool'] = Sidekiq.options[:concurrency] + 2 config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 5 ActiveRecord::Base.establish_connection(config) end end
> database.yml:pool设置为60,收获频率为5
> Postgresql 9.3:最大连接数为2000,无其他修改
问题:连接打开太多
当我检查SELECT * FROM pg_stat_activity;我看到有一些旧的连接开放,没有Sidekiq工作人员忙,和一些新的他们在哪里:
16661 measurement 6354 16384 measurement unicorn worker[0] -c /home/web...E production -D -l0.0.0.0:8080 127.0.0.1 50775 2015-02-20 12:52:48.572551+01 2015-02-20 13:05:05.773178+01 2015-02-20 13:05:05.773565+01 f idle SELECT COUNT(*) FROM "measurements" 16661 measurement 6406 16384 measurement unicorn worker[2] -c /home/web...E production -D -l0.0.0.0:8080 127.0.0.1 50776 2015-02-20 12:53:59.636414+01 2015-02-20 13:04:53.930305+01 2015-02-20 13:04:53.931+01 f idle SELECT COUNT(*) FROM "measurements" 16661 measurement 6687 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50801 2015-02-20 13:00:05.14621+01 2015-02-20 13:04:49.558589+01 2015-02-20 13:04:49.558835+01 f idle COMMIT 16661 measurement 7042 16384 measurement unicorn worker[1] -c /home/web...E production -D -l0.0.0.0:8080 127.0.0.1 50997 2015-02-20 13:00:34.874675+01 2015-02-20 13:00:35.376593+01 2015-02-20 13:00:35.376979+01 f idle SELECT COUNT(*) FROM "measurements" 16661 measurement 6668 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50781 2015-02-20 13:00:04.883553+01 2015-02-20 13:04:19.108365+01 2015-02-20 13:04:19.108567+01 f idle COMMIT 16661 measurement 6669 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50782 2015-02-20 13:00:04.908349+01 2015-02-20 13:03:57.683036+01 2015-02-20 13:03:57.683236+01 f idle COMMIT 16661 measurement 6672 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50786 2015-02-20 13:00:04.962251+01 2015-02-20 13:04:32.395137+01 2015-02-20 13:04:32.395344+01 f idle COMMIT 16661 measurement 6674 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50788 2015-02-20 13:00:04.98456+01 2015-02-20 13:04:32.396335+01 2015-02-20 13:04:32.39652+01 f idle COMMIT 16661 measurement 6676 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50790 2015-02-20 13:00:05.006847+01 2015-02-20 13:04:19.059628+01 2015-02-20 13:04:19.059831+01 f idle COMMIT 16661 measurement 6678 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50792 2015-02-20 13:00:05.029448+01 2015-02-20 13:04:23.730293+01 2015-02-20 13:04:23.730523+01 f idle COMMIT 16661 measurement 6680 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50794 2015-02-20 13:00:05.051932+01 2015-02-20 13:04:49.557435+01 2015-02-20 13:04:49.557633+01 f idle COMMIT 16661 measurement 6684 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50798 2015-02-20 13:00:05.124225+01 2015-02-20 13:03:51.693799+01 2015-02-20 13:03:51.694034+01 f idle COMMIT 16661 measurement 6690 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50804 2015-02-20 13:00:05.168099+01 2015-02-20 13:04:54.849239+01 2015-02-20 13:04:54.849459+01 f idle COMMIT 16661 measurement 6693 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50807 2015-02-20 13:00:05.189661+01 2015-02-20 13:04:18.688459+01 2015-02-20 13:04:18.688732+01 f idle COMMIT 16661 measurement 6696 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50810 2015-02-20 13:00:05.210659+01 2015-02-20 13:03:57.68424+01 2015-02-20 13:03:57.684483+01 f idle COMMIT 16661 measurement 6699 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50813 2015-02-20 13:00:05.231641+01 2015-02-20 13:04:04.962397+01 2015-02-20 13:04:04.96258+01 f idle COMMIT 16661 measurement 6701 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50815 2015-02-20 13:00:05.252357+01 2015-02-20 13:04:41.685372+01 2015-02-20 13:04:41.685594+01 f idle COMMIT 16661 measurement 6706 16384 measurement sidekiq 3.3.0 measurement [0 of 50 busy] 127.0.0.1 50820 2015-02-20 13:00:05.273301+01 2015-02-20 13:04:23.733488+01 2015-02-20 13:04:23.733681+01 f idle COMMIT 16661 measurement 7003 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50967 2015-02-20 13:00:09.004487+01 2015-02-20 13:02:02.036429+01 2015-02-20 13:02:02.036696+01 f idle COMMIT 16661 measurement 7005 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50969 2015-02-20 13:00:11.118961+01 2015-02-20 13:02:48.341078+01 2015-02-20 13:02:48.341294+01 f idle COMMIT 16661 measurement 7006 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50970 2015-02-20 13:00:12.245408+01 2015-02-20 13:03:04.300372+01 2015-02-20 13:03:04.300575+01 f idle COMMIT 16661 measurement 7007 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50971 2015-02-20 13:00:12.648636+01 2015-02-20 13:03:01.855616+01 2015-02-20 13:03:01.85588+01 f idle COMMIT 16661 measurement 7008 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50972 2015-02-20 13:00:12.956139+01 2015-02-20 13:03:13.840023+01 2015-02-20 13:03:13.840466+01 f idle COMMIT 16661 measurement 7009 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50973 2015-02-20 13:00:13.02424+01 2015-02-20 13:02:50.115996+01 2015-02-20 13:02:50.116259+01 f idle COMMIT 16661 measurement 7010 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50974 2015-02-20 13:00:13.0909+01 2015-02-20 13:03:09.968+01 2015-02-20 13:03:09.968284+01 f idle COMMIT 16661 measurement 7014 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50976 2015-02-20 13:00:14.929822+01 2015-02-20 13:03:20.183195+01 2015-02-20 13:03:20.183467+01 f idle COMMIT 16661 measurement 7020 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50980 2015-02-20 13:00:22.498892+01 2015-02-20 13:03:29.887257+01 2015-02-20 13:03:29.887599+01 f idle COMMIT 16661 measurement 7021 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50981 2015-02-20 13:00:22.898087+01 2015-02-20 13:03:39.689939+01 2015-02-20 13:03:39.69798+01 f idle COMMIT 16661 measurement 7022 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50982 2015-02-20 13:00:23.215846+01 2015-02-20 13:03:03.918339+01 2015-02-20 13:03:03.918613+01 f idle COMMIT 16661 measurement 7023 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50983 2015-02-20 13:00:23.930861+01 2015-02-20 13:03:51.504525+01 2015-02-20 13:03:51.512786+01 f idle COMMIT 16661 measurement 7025 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50985 2015-02-20 13:00:24.409999+01 2015-02-20 13:03:16.000375+01 2015-02-20 13:03:16.006178+01 f idle COMMIT 16661 measurement 7027 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50986 2015-02-20 13:00:25.786321+01 2015-02-20 13:03:22.631091+01 2015-02-20 13:03:22.631353+01 f idle COMMIT 16661 measurement 7045 16384 measurement sidekiq 3.3.0 measurement [15 of 50 busy] 127.0.0.1 50998 2015-02-20 13:00:48.004036+01 2015-02-20 13:03:39.717124+01 2015-02-20 13:03:39.722956+01 f idle COMMIT
总共有34个连接,但是我只运行了15个测量.
在sidekiq.log中,它们都显示为完成:
2015-02-20T12:00:04.879Z 6235 TID-osgop8md0 MeasurementWorker JID-1cdcd44bf41fefe9ddca21ac INFO: start 2015-02-20T12:00:04.907Z 6235 TID-osgoox8hg MeasurementWorker JID-79f4d1ff6692248682ba93dd INFO: start 2015-02-20T12:00:04.939Z 6235 TID-osgoos38k MeasurementWorker JID-09f95fbccd2438d17916d425 INFO: start 2015-02-20T12:00:04.940Z 6235 TID-osgoorlmo MeasurementWorker JID-be1a57871f26146e9884107e INFO: start 2015-02-20T12:00:04.959Z 6235 TID-osgoow16k MeasurementWorker JID-e296efec897c23629b96e99f INFO: start 2015-02-20T12:00:04.968Z 6235 TID-osgoouytg MeasurementWorker JID-c6a57700872b7fe427e33664 INFO: start 2015-02-20T12:00:04.984Z 6235 TID-osgooz63k MeasurementWorker JID-f7448eaffe109882130497ca INFO: start 2015-02-20T12:00:04.998Z 6235 TID-osgoozvzs MeasurementWorker JID-c55c04f3424268fba50ec048 INFO: start 2015-02-20T12:00:05.014Z 6235 TID-osgooyr6c MeasurementWorker JID-01bd303e953fd2998fe3f8d1 INFO: start 2015-02-20T12:00:05.030Z 6235 TID-osgon0ums MeasurementWorker JID-6949c5c81b4c254046f0c585 INFO: start 2015-02-20T12:00:05.042Z 6235 TID-osgomw0g8 MeasurementWorker JID-cc03b717f81dd6fb0f58a946 INFO: start 2015-02-20T12:00:05.111Z 6235 TID-osgomlno8 MeasurementWorker JID-10eebcd76113f3565d8265ca INFO: start 2015-02-20T12:00:05.113Z 6235 TID-osgomjzzw MeasurementWorker JID-b0536d9a029faed0ba8eb5d3 INFO: start 2015-02-20T12:00:05.114Z 6235 TID-osgomhyms MeasurementWorker JID-1f6624314afd8e0ae611599f INFO: start 2015-02-20T12:00:05.115Z 6235 TID-osgolgzoc MeasurementWorker JID-24e87960d7e7fbd871037dd3 INFO: start 2015-02-20T12:02:02.333Z 6235 TID-osgomw0g8 MeasurementWorker JID-cc03b717f81dd6fb0f58a946 INFO: done: 117.291 sec 2015-02-20T12:02:48.983Z 6235 TID-osgoox8hg MeasurementWorker JID-79f4d1ff6692248682ba93dd INFO: done: 164.077 sec 2015-02-20T12:02:50.688Z 6235 TID-osgomlno8 MeasurementWorker JID-10eebcd76113f3565d8265ca INFO: done: 165.577 sec 2015-02-20T12:03:02.429Z 6235 TID-osgolgzoc MeasurementWorker JID-24e87960d7e7fbd871037dd3 INFO: done: 177.314 sec 2015-02-20T12:03:04.581Z 6235 TID-osgoorlmo MeasurementWorker JID-be1a57871f26146e9884107e INFO: done: 179.641 sec 2015-02-20T12:03:09.453Z 6235 TID-osgop8md0 MeasurementWorker JID-1cdcd44bf41fefe9ddca21ac INFO: done: 184.573 sec 2015-02-20T12:03:10.362Z 6235 TID-osgoouytg MeasurementWorker JID-c6a57700872b7fe427e33664 INFO: done: 185.394 sec 2015-02-20T12:03:14.232Z 6235 TID-osgomjzzw MeasurementWorker JID-b0536d9a029faed0ba8eb5d3 INFO: done: 189.118 sec 2015-02-20T12:03:16.347Z 6235 TID-osgoos38k MeasurementWorker JID-09f95fbccd2438d17916d425 INFO: done: 191.408 sec 2015-02-20T12:03:20.398Z 6235 TID-osgoow16k MeasurementWorker JID-e296efec897c23629b96e99f INFO: done: 195.439 sec 2015-02-20T12:03:22.947Z 6235 TID-osgomhyms MeasurementWorker JID-1f6624314afd8e0ae611599f INFO: done: 197.833 sec 2015-02-20T12:03:30.212Z 6235 TID-osgooz63k MeasurementWorker JID-f7448eaffe109882130497ca INFO: done: 205.228 sec 2015-02-20T12:03:39.931Z 6235 TID-osgooyr6c MeasurementWorker JID-01bd303e953fd2998fe3f8d1 INFO: done: 214.918 sec 2015-02-20T12:03:39.936Z 6235 TID-osgon0ums MeasurementWorker JID-6949c5c81b4c254046f0c585 INFO: done: 214.906 sec 2015-02-20T12:03:51.694Z 6235 TID-osgoozvzs MeasurementWorker JID-c55c04f3424268fba50ec048 INFO: done: 226.696 sec
所以,对于每次测量,Sidekiq似乎打开了2个连接,但它并没有关闭它们.我该怎么办?
解决方法
问题是您的赛璐id演员泄漏连接:当您在新线程中执行需要活动记录连接的事情(或更明显地)时,AR会自动检出连接,但不会自动检查.
ActiveRecord::Base.connection_pool.with_connection do # do stuff end
这应该清除在其中创建的任何连接(如同你在sidekiq工作人员).
或者,您可以尝试显式检出连接
@connection = ActiveRecord::Base.connection_pool.checkout
然后在你的终结者中检查它
ActiveRecord::Base.connection_pool.checkin @connection
如果你的演员很久很久,你可能会比连接的时间要长一些(另一方面,在一个严密的循环中检查一个连接是无效的)