Test for dynamic_reconfiguration hangs. #7
Comments
If I
|
Hmm, I can't reproduce this locally. |
@michaelklishin - is there anything I can do locally to give you more useful information for troubleshooting? I spent some time trying to connect to the running erlang node and get a backtrace, but didn't get that working - if you have a procedure for that or for retrieving some other debugging info I'd be happy to get you whatever you need - I'm not afraid to get my hands dirty here. Also, I'm on freenode as |
You can attach a shell to a running node and run the observer app from there, for example. |
Here's the result of |
Does this correlate with the log (namely, can I assume that the node was not restarted between you've taken this log and attached a shell? |
@michaelklishin - This is not true of the rabbitmq log output above, but shortly after posting the gist I attached a second file with rabbitmq log output correlated to the erlang shell output, without restarting the rabbitmq node. So those two files in the gist are correlated as you describe. |
So I thought I'd try to reproduce this on Circle CI, which has the nice feature that the owner of a given build can SSH into the CI VM to troubleshoot what went wrong if the build fails. When I did this, I hit a different problem before reaching the test with the problem I filed this issue ticket about, but I wanted to mention it here first (if you want me to file a separate ticket, though, I can). If you like you can make a personal fork of Alternatively, I don't know what OS you run on your dev machine (I use Linux - Fedora 22), but you might find you run into some of these same problems if you run the tests in some flavor of Linux in a VM on your dev machine (it looks like Circle CI uses some version of Ubuntu). Update: When run again, the Circle CI test hung in a different place. |
I used OS X, our CI machines use Debian. |
I can reproduce the same problem using a Debian docker image. Using OSX I don't have the problem. If you need: 1 - |
The The test gets blocked during the cleanup, when it tries to delete the exchanges created. It is the supervisor blocked in stop child ( This is the trace of what the exchange was doing and where it got blocked, thus blocking the supervisor:
The The Thus, at this point RabbitMQ is trying to declare again the exchange while it is being deleted by the supervisor. Update 29/12/15 That calls:
Thus, the supervisor is waiting for a call that calls the supervisor itself -> deadlock. The proposed solution sets an status |
Thank you, @Ayanda-D and @dcorbacho. We will QA the issue before Monday noon and produce a build for the user to try. |
We believe this is fixed in rabbitmq/rabbitmq-server#533. Happy to provide a one-off build from |
@dcorbacho also: can we switch related supervisors to use a more sensible timeout, e.g. 30 seconds? |
I'll look at it @michaelklishin |
I've been having some issues with the federation plugin (that, incidentally, may have to do with dynamic reconfiguration), so I thought I would clone the repository, get the test suite running locally, and try to make a failing test for my issue (which I'm still trying to pinpoint to a minimal reproduction).
However, when running the test suite locally, the tests hang indefinitely at
rabbit_federation_exchange_test
:dynamic_reconfiguration
The text was updated successfully, but these errors were encountered: