Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atom table exhaustion due to rabbitmqctl node names #549

Closed
binarin opened this issue Jan 14, 2016 · 10 comments
Closed

Atom table exhaustion due to rabbitmqctl node names #549

binarin opened this issue Jan 14, 2016 · 10 comments

Comments

@binarin
Copy link
Contributor

binarin commented Jan 14, 2016

rabbitmqctl generate node names like 'rabbit-cli-<pid>@<host>'. The problem is that node names are stored in atom table and not garbage collected.

Some numbers:

  • Maximum pid number on linux is 32768 by default
  • Maximum pid could be increased to ~4 millions on 64-bit CPU
  • Erlang atom table has default limit of 1048576 atoms.

So on linux any long running rabbitmqbroker could be crashed by constantly running rabbitmqctl:

  • in cluster of more than 32 nodes (with default pid limit)
  • or even on single node, after increasing that limit

While it sounds ridiculous, we've observed such a crash on a production server )

There should be some way to generate rabbitmqctl nodenames without introducing large quantities of uncollected garbage.

@michaelklishin
Copy link
Member

I am not 100% sure but suspect that CLI tools use generated node names so that they can run concurrently. How can we accomplish that while also using a limited number of names?

@hairyhum
Copy link
Contributor

We can catch net_kernel:start and change name if it's taken. And we should also look at rabbit_nodes:ensure_epmd, because it is also generates big random node names.

@michaelklishin
Copy link
Member

@hairyhum 👍. Feel free to work with @binarin on a proof of concept when you have a chance.

@binarin
Copy link
Contributor Author

binarin commented Jan 14, 2016

Actually, it's not a problem with rabbit_nodes:ensure_epmd - starter node is not connecting anywhere, so there will be no pollution.

@hairyhum hairyhum self-assigned this Jan 14, 2016
@binarin
Copy link
Contributor Author

binarin commented Jan 14, 2016

I think I'll provide some preliminary patch today.

binarin added a commit to binarin/rabbitmq-server that referenced this issue Jan 14, 2016
It prevents atom table overflow in a long running broker.

Fixes rabbitmq#549
@michaelklishin michaelklishin modified the milestones: 3.6.1, 3.6.x Jan 14, 2016
@michaelklishin
Copy link
Member

The PR provided by @binarin looks reasonable, I will test it later today. Thank you for your ongoing contributions.

@michaelklishin
Copy link
Member

I've been running 2 instances of rabbitmqctl and one rabbitmq-plugins in tight loops with this patch for half an hour now, looking good. epm -names reports that CLI node names are what we expect. Looking good so far.

@videlalvaro
Copy link
Collaborator

While it sounds ridiculous, we've observed such a crash on a production server )

Thanks for reporting this @binarin

@michaelklishin
Copy link
Member

Fixed by #552 (and rabbitmq/rabbitmq-common#48). I like the solution we now have, thank you @binarin!

@michaelklishin
Copy link
Member

This problem has re-appeared in the current generation of CLI tools (rabbitmq/rabbitmq-cli) and had to be addressed the same way in rabbitmq/rabbitmq-cli#461.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants