Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocked channels and queues using HA #581

Closed
dcorbacho opened this issue Jan 26, 2016 · 5 comments
Closed

Blocked channels and queues using HA #581

dcorbacho opened this issue Jan 26, 2016 · 5 comments
Assignees
Labels
Milestone

Comments

@dcorbacho
Copy link
Contributor

Using HA and ‘auto_delete’ queues, the system gets eventually to a state where some channels don’t have associated connections (shown in the management UI as ‘unknown’) and calls to ‘rabbitmqctl list_channels’ do not return.

This is caused by the queues blocking the channels during the termination, when the rabbit_mirror_queue_master does not return. These are caused by the receive clause in: https://github.com/rabbitmq/rabbitmq-server/blob/stable/src/rabbit_mirror_queue_master.erl#L215

@michaelklishin
Copy link
Member

FTR, this was discovered in an escalation and the investigation is still in progress. We have a patch candidate but it needs testing. We also have a couple of potentially related issues which we haven't gotten to the bottom yet, which may or may not be actually related.

@Gsantomaggio
Copy link
Member

Here the steps to reproduce the issue:

1 - RabbitMQ cluster with 3 nodes

  • Node 1 - 10.100.0.101
  • Node 2 - 10.100.0.102
  • Node 3 - 10.100.0.103

Setup this policy:

rabbitmqctl set_policy all "" '{"ha-mode":"all","ha-sync-mode":"automatic"}'

2 - Execute the following script to one node, ex: Node 3

while true; do
echo "START"
sleep 5
rabbitmqctl list_channels
echo "sync 1 "
rabbitmqctl list_queues -q synchronised_slave_pids
sleep 2
rabbitmqctl stop_app
sleep 2
echo "stop 3"
rabbitmqctl reset
rabbitmqctl join_cluster rabbit@node1
sleep 3
rabbitmqctl start_app
sleep 5
echo "sync 2 "
rabbitmqctl list_queues -q synchronised_slave_pids

rabbitmqctl list_channels
iptables -A OUTPUT -d 10.100.0.101  -j DROP
echo "blocked"
sleep 25
iptables -F
echo "unblocked"
done

3 - execute this python or/and this java scripts

4 - after a few minutes you will see the command rabbitmqctl list_channels blocked.

On the management UI you will see channels with unknown connection

@michaelklishin michaelklishin added this to the 3.6.1 milestone Jan 26, 2016
@michaelklishin
Copy link
Member

To make it clear: this issue is about (one more) missing/unreasonable timeout value. There can be more issues reproduced by the sequence posted by @Gsantomaggio. Our candidate patch introduces a timeout. The rest is still under investigation.

@falfaro
Copy link

falfaro commented Mar 22, 2016

Does this bug affect RabbitMQ 3.6.1?

@dcorbacho
Copy link
Contributor Author

@falfaro this issue is solved in 3.6.1.

#675, referenced above, is a different issue which might be triggered in similar circumstances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants