Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pathological performance of file_handle_cache read buffer when synchronising queues #69

Closed
simonmacmullen opened this issue Mar 13, 2015 · 7 comments
Assignees
Milestone

Comments

@simonmacmullen
Copy link
Contributor

Noticed while working on #65.

The queue eager-sync process ends up reading messages from the queue in the opposite order to that in which they were published. If the messages are being read from the message store this causes very poor performance since 3.5.0.

If we have 10kB messages then we read 1MB from near the end of the file, then seek back 10kB, decide that's outside the buffer, read another 1MB, seek back 10kB, and so on. On my machine we sync ~250 msg/s like that. Oops!

Ways we could fix this, in ascending order of cleverness:

  • Hard code a smaller buffer size
  • Dynamically shrink the buffer size if we determine it is not working
  • Read the buffer backwards from our seek point if we detect we are seeking backwards
@simonmacmullen simonmacmullen self-assigned this Mar 13, 2015
simonmacmullen pushed a commit that referenced this issue Mar 13, 2015
...such as when we are reading backwards through a file. References #69.
@simonmacmullen
Copy link
Contributor Author

Hard code a smaller buffer size

Seems rather sad.

Dynamically shrink the buffer size if we determine it is not working

This is the approach I've gone for, primarily because it should be able to stop other pathological behaviour.

Read the buffer backwards from our seek point if we detect we are seeking backwards

This might be a nice option to get syncing going still faster, but it's also fiddly and only solves this exact problem. I'll settle for having it no worse than it was in 3.4.4.

@simonmacmullen
Copy link
Contributor Author

Just for clarity, this bug:

  • Only affects 3.5.0
  • Requires messages to be larger than the queue index embedding threshold (by default 4kB)
  • Requires messages to be paged out before synchronisation starts

You can see in the I/O stats on the master that if (say) 250 messages are read from disk per second, we also read 250MB/s even if the messages are much smaller than that.

@dumbbell
Copy link
Member

Here is what I did to test the correction:

  1. I start two nodes, A and B, with a very low vm_memory_high_watermark to make them page messages out early, clustered them and added the following HA policy on node B:

    rabbitmqctl -n B set_policy ha-all "." '{"ha-mode":"all"}'
    
  2. I stopped node B using:

    rabbitmq -n B stop_app
    
  3. I used PerfTest to publish 10 kB messages with a rate-limited consumer so messages stay in RabbitMQ:

    PerfTest -s 10240 -R 100
    
  4. The producer could publish around 40,000 messages before being throttled.

  5. I started node B again and force synchronisation from the management UI.

With the stable branch, the management UI reports I/O read rates of:

  • 150 messages/s
  • 150 MB/s

With the rabbitmq-server-69 branch (this fix), it reports:

  • 1000 messages/s
  • 15 MB/s

I logged the size of the read buffer in file_handle_cache.erl at the same time. With stable, the buffer remains at an expected 1MB size. With the fix, the size continuously switches between 10468 and 20936, with an occasional jump to 4 MB.

@simonmacmullen
Copy link
Contributor Author

Note that you don't need the -R 100, you can use -y0 -u test -p to get PerfTest to publish to a queue with no consumers which might be easier to work with.

The 4MB sizes probably refer to other files (queue index files?)

Not sure whether the flicking between 10468 and 20936 is worth fixing, what do you think?

@simonmacmullen
Copy link
Contributor Author

Oh, also you can set a very low vm_memory_high_watermark_paging_ratio rather than vm_memory_high_watermark, that way you can publish indefinitely but get paged out rapidly.

@dumbbell
Copy link
Member

One correction to my previous comment:

150 messages/s
1000 messages/s

Those should read:

  • 150 reads/s
  • 1000 reads/s

The 4MB sizes probably refer to other files (queue index files?)

You're right, the file handle differs for those reads.

Not sure whether the flicking between 10468 and 20936 is worth fixing, what do you think?

After a test:

  • With the flickering buffer:
    • 1000 reads/s
    • 15 MB/s from disc
    • 10-12 MB/s sent to node B
  • With a constant buffer:
    • 750 reads/s
    • 7 MB/s from disc
    • 7 MB/s sent to node B

In the first case, we read 20 kB to only use 10 kB, then we read 10 kB, then we double and so on. We don't do this in the second case (we always read 10 kB). When comparing the number of reads to the throughput, we see the 33% decrease of throughput in the second case, corresponding to not wasting 10 kB. However, I can't explain why it is slower...

dumbbell added a commit that referenced this issue Mar 20, 2015
@dumbbell
Copy link
Member

Here are new, more meaningful numbers comparing stable and 8faf4ee.

The protocol is:

  1. Start nodes A and B, cluster them, add a HA policy.
  2. Create a queue from the management UI.
  3. Stop node N.
  4. Use PerfTest to queue 300,000 messages, which is enough to page them out (the filesystem is tmpfs). No clients are connected after that.
  5. Start B and force synchronization. While this happens, look at the time the full sync takes, as well as I/O and network statistics.

Results with stable:

  • Synchronization finished in 1'55".
  • Reads: 1600/s (1.6 GiB/s)
  • Network (from A to B): 18 MiB/s while messages are paged in, then 58 MiB/s

Results with 8faf4ee:

  • Synchronization finished in 1'10".
  • Reads: 4500/s (57 MiB/s)
  • Network (from A to B): 58 MiB/s

simonmacmullen pushed a commit that referenced this issue Mar 23, 2015
...such as when we are reading backwards through a file. References #69.
dumbbell added a commit that referenced this issue Mar 23, 2015
simonmacmullen pushed a commit that referenced this issue Mar 23, 2015
Take into account the size of the current read if appropriate, add an
extra byte when shrinking so next time we know how we did, and fix
comments all round.

References #69.
@dumbbell dumbbell added this to the 3.5.1 milestone Mar 23, 2015
dcorbacho pushed a commit that referenced this issue May 27, 2016
Updates queue_master_location/1 parameter spec from pid() to #amqqueue{}
dcorbacho pushed a commit that referenced this issue Nov 18, 2020
dcorbacho pushed a commit that referenced this issue Jul 5, 2023
…ffset-epoch-atom

Make init_reader and init_data_reader use a consistent return value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants