Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The file handle cache ignores the memory high watermark #134

Closed
michaelklishin opened this issue Apr 29, 2015 · 11 comments
Closed

The file handle cache ignores the memory high watermark #134

michaelklishin opened this issue Apr 29, 2015 · 11 comments
Assignees
Milestone

Comments

@michaelklishin
Copy link
Member

From an escalation (VESC-420). Manual GC triggering does not change anything. So the issue most likely is an unbounded buffer, e.g. message store overwhelming queue master with messages it cannot relay to the mirror fast enough.

@michaelklishin
Copy link
Member Author

To reproduce:

  • Create a 2 node cluster
  • Fill up a durable queue with persistent messages (say, 10+ GB)
  • Stop and reset the mirror
  • Re-add it back
  • Monitor RAM use on the master node

@dumbbell dumbbell changed the title During eager sync of large queues, master node RAM usage can spike The file handle cache ignores the memory high watermark May 4, 2015
@dumbbell
Copy link
Member

dumbbell commented May 4, 2015

It appears the memory is consumed by the file handle cache: the master queue's process dictionary is full of {Ref, fhc_handle} => {handle, ...} key/value pairs which include the read buffer.

During my test, the queue contains 1000 messages of 10 MiB each. The memory high watermark being set at around 2 GiB, nearly all (if not all) messages are paged to the disk. However, the queue process is happily consuming 5 GiB of memory. This matches what erlang:process_info(Queue_Process, binary) returns (in my case, 500 referenced binaries of 10 MiB each).

Neither the file handle cache nor the message store ask the memory monitor how much memory they are allowed to consume. They only pay attention to the number of open file handles.

@dumbbell dumbbell assigned dumbbell and unassigned michaelklishin May 4, 2015
@michaelklishin
Copy link
Member Author

@dumbbell worth delaying 3.5.2 for if you think we can fix this in under a week.

@dumbbell
Copy link
Member

dumbbell commented May 5, 2015

I agree, let's delay 3.5.2. If I don't have a good solution in a week, we can do the release.

dumbbell added a commit that referenced this issue May 5, 2015
... and clear read cache if necessary.

This solves an issue where sync'ing a mirrored queue could take all
available memory (way above the high watermark) and possibly crash the
node.

Fixes #134.
@dumbbell
Copy link
Member

dumbbell commented May 5, 2015

This is a preliminary commit. There is a debug log message left on purpose for instance; I will remove it after further testing.

@dumbbell
Copy link
Member

dumbbell commented May 6, 2015

Here are the results of today's testing and benchmark:

I measured the time taken to sync one mirrored queue containing 1000 messages of 10 MiB each (10 GiB worth of data). The two nodes are running on the same host, the disk is an SSD formatted using ZFS (which does heavy caching). Both had a high watermark set to 2.2 GiB.

I did four runs for each of the following versions:

  • stable which exhibits the bug described here;
  • rabbitmq-server-134 with the explicit garbage collection;
  • rabbitmq-server-134 without the explicit garbage collection.

Results:

$ ministat -s -w 88 stable \#134-gc \#134-nogc 
x stable
+ #134-gc
* #134-nogc
+----------------------------------------------------------------------------------------+
|       *                                                                                |
|x  +   *** x            +  x                    x         +                            +|
||____________________A_____M______________|                                             |
|      |____________________________________A______________M_____________________|       |
|       |A|                                                                              |
+----------------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   4        29.628         32.52        31.263       30.9195     1.2613909
+   4        29.783        34.909        33.127      32.22575      2.257041
No difference proven at 95.0% confidence
*   4        30.034        30.165         30.11      30.08825   0.061277375
No difference proven at 95.0% confidence

The explicit garbage collection gives very unstable performance. However, the proposed fix without this explicit garbage collection gives stable timings, even more stable than the stable branch, resulting in a slightly faster sync process.

During the tests, memory consumption with both versions of the fix could be up-to 3 GiB (800 MiB above the high watermark) for a few seconds. I believe it is the garbage collector who can temporarily allocate a lot of memory.

The commit I force-pushed removes the call to the garbage collector and the debug log message.

@dumbbell
Copy link
Member

dumbbell commented May 6, 2015

Because this change could modify the behaviour in a stable branch, what do you think about hiding it behind a configuration knob?

@michaelklishin
Copy link
Member Author

It makes little sense to me to have a config value for this. This is what the users probably expect to be "always on". What kind of breaking changes are you worried about?

@dumbbell
Copy link
Member

dumbbell commented May 6, 2015

As a side note, this file handle cache, which reimplements what the underlying OS already does, can lead to unexpectedly lower performance. During the test, the high watermark was set to 2.2 GiB. I then tried with the default high watermark (which gave 6.2 GiB on this machine): the average sync time went from 30" to 40" (33% decrease of performance)!

The problem was that with that much memory, the file handle cache was competing with the ZFS ARC (ZFS' cache). As a result, part of RabbitMQ was paged out to disk... Admittedly, ZFS may not be the best filesystem for RabbitMQ, but we may want to keep that in mind and, one day, revisit how we do file caching.

@dumbbell
Copy link
Member

dumbbell commented May 6, 2015

What kind of breaking changes are you worried about?

The current fix arbitrarily drops "2 * memory above limit" amount of data (taken among the least recently used read buffers) when the memory use goes above the limit. It seems to work for the tested scenario. But this cache is used by queue paging as well for instance. I have no idea if it could be counter-productive in other production workloads.

@dumbbell
Copy link
Member

Since RabbitMQ 2.0.0.

michaelklishin added a commit that referenced this issue May 19, 2015
Since the FHC now uses memory monitor, we need to make sure it is
running early on. Nothing in rabbit_alarm really depends on the
steps that used to run earlier.

`full` test suite passes.

It may be worth extracting memory and disk monitors into separate
boot steps down the road.

While here, remove a comment that's misleading. Originally introduced as
part of bug24998, doesn't seem to have any significant reasoning behind
it other than "complete separation of FHC".

References #134.
Fixes #157.
michaelklishin added a commit that referenced this issue May 19, 2015
Since the FHC now uses memory monitor, we need to make sure it is
running early on. Nothing in rabbit_alarm really depends on the
steps that used to run earlier.

`full` test suite passes.

It may be worth extracting memory and disk monitors into separate
boot steps down the road.

References #134.
Fixes #157.
dcorbacho pushed a commit that referenced this issue Nov 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants