我为我的客户管理一个rails应用程序,最近它崩溃了.在我发现之前,该网站已经停机了9个小时.我检查了日志,过去9小时的每个请求都附加了以下代码:
at=error code=H10 desc="App crashed"
在此之前,我看到以下日志:
2012-11-16T00:55:46+00:00 heroku[web.1]: Idling 2012-11-16T00:55:50+00:00 heroku[web.1]: Stopping all processes with SIGTERM 2012-11-16T00:55:51+00:00 app[web.1]: [2012-11-16 00:55:51] ERROR SignalException: SIGTERM 2012-11-16T00:55:51+00:00 app[web.1]: /usr/local/lib/ruby/1.9.1/webrick/server.rb:90:in `select' 2012-11-16T00:56:00+00:00 heroku[web.1]: Error R12 (Exit timeout) -> At least one process Failed to exit within 10 seconds of SIGTERM 2012-11-16T00:56:00+00:00 heroku[web.1]: Stopping remaining processes with SIGKILL 2012-11-16T00:56:02+00:00 heroku[web.1]: State changed from up to down 2012-11-16T00:56:02+00:00 heroku[web.1]: Process exited with status 137 2012-11-16T01:03:55+00:00 heroku[web.1]: Unidling 2012-11-16T01:03:55+00:00 heroku[web.1]: State changed from down to starting 2012-11-16T01:03:59+00:00 heroku[web.1]: Starting process with command `bundle exec rails server -p 4303` 2012-11-16T01:04:00+00:00 heroku[Nginx]: 98.139.241.251 - - [16/Nov/2012:01:04:00 +0000] "GET / HTTP/1.1" 499 0 "-" "YahooCacheSystem" domain.com 2012-11-16T01:04:22+00:00 app[web.1]: => Ctrl-C to shutdown server 2012-11-16T01:04:22+00:00 app[web.1]: ** [NewRelic][11/16/12 01:04:21 +0000 b8af98a1-2246-4b34-9dfe-61b9d4b747bc (2)] INFO : Dispatcher: webrick 2012-11-16T01:04:22+00:00 app[web.1]: ** [NewRelic][11/16/12 01:04:21 +0000 b8af98a1-2246-4b34-9dfe-61b9d4b747bc (2)] INFO : Application: acsolar 2012-11-16T01:04:22+00:00 app[web.1]: ** [NewRelic][11/16/12 01:04:21 +0000 b8af98a1-2246-4b34-9dfe-61b9d4b747bc (2)] INFO : New Relic Ruby Agent 3.4.0.1 Initialized: pid = 2 2012-11-16T01:04:22+00:00 app[web.1]: => Booting WEBrick 2012-11-16T01:04:22+00:00 app[web.1]: => Rails 3.1.1 application starting in production on http://0.0.0.0:4303 2012-11-16T01:04:22+00:00 app[web.1]: => Call with -d to detach 2012-11-16T01:04:25+00:00 app[web.1]: [DEPRECATION] Your applications public directory contains an assets/products and/or assets/taxons subdirectory. 2012-11-16T01:04:25+00:00 app[web.1]: Run `rake spree:assets:relocate_images` to relocate the images. 2012-11-16T01:04:34+00:00 app[web.1]: ** [NewRelic][11/16/12 01:04:32 +0000 b8af98a1-2246-4b34-9dfe-61b9d4b747bc (2)] INFO : Reporting performance data every 60 seconds. 2012-11-16T01:04:34+00:00 app[web.1]: Connected to NewRelic Service at collector-5.newrelic.com 2012-11-16T01:05:00+00:00 heroku[web.1]: Error R10 (Boot timeout) -> Web process Failed to bind to $PORT within 60 seconds of launch 2012-11-16T01:05:00+00:00 heroku[web.1]: Stopping process with SIGKILL 2012-11-16T01:05:02+00:00 heroku[web.1]: Process exited with status 137 2012-11-16T01:05:02+00:00 heroku[web.1]: State changed from crashed to down 2012-11-16T01:05:02+00:00 heroku[web.1]: State changed from starting to crashed
我猜它可能已经旋转并且有一个错误启动备份,但为什么它没有重新启动它仍然处于崩溃状态?如果将来再次发生这种情况,有什么办法可以让它自动重启吗?
我也有NewRelic在这上面运行它根本没有通知我,但这是我必须调查的另一个问题.
解决方法
Heroku的支持回答建议使用heroku restart手动重启你的应用程序.他们现在正在解决这个问题.
Hi,A process management error on our side caused some crashed apps only running 1 web dyno to be reported as “idle” even though they were actually crashed. This means that the crashed dyno was never restarted,causing subsequent requests to fail. We’ve identified this problem and are implementing a fix. If your app is still unresponsive,please try restarting it with the heroku restart command. Please let us know if you need more help. Thanks,Heroku Support