Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow change feed notifications for raft related data #4902

Closed
mbroadst opened this issue Sep 30, 2015 · 19 comments
Closed

allow change feed notifications for raft related data #4902

mbroadst opened this issue Sep 30, 2015 · 19 comments

Comments

@mbroadst
Copy link
Contributor

My immediate use case is to be informed of whether a given rethinkdb instance has become the leader, however this could potentially be useful for other cases. Temporarily I can set up a change feed on the logs table and filter for the term "Raft", however a more direct means would be ideal.

Though this was discussed briefly on the mailing list, no actual proposal was put forth. If this was something as simple as exposing a key like "raft_role" maybe on the server_status table (it looks like here: https://github.com/rethinkdb/rethinkdb/blob/next/src/clustering/administration/servers/server_status.cc#L40), then I could totally put together a PR for that, however I wanted to open up the issue here first to discuss potentially better alternatives (or interest in the first place!).

@coffeemug
Copy link
Contributor

/cc @danielmewes

Daniel, what do you think about this?

@danielmewes
Copy link
Member

A quick clarification ahead: RethinkDB uses one raft cluster per table. That means that for each table, a different server can be the Raft leader.

How about exposing a raft_leader field in table_status, which gets set to the server that's currently the Raft leader (or null if there is no known leader)?
I quite like that idea.

Also pinging @VeXocide for any thoughts on this.

@mbroadst
Copy link
Contributor Author

mbroadst commented Oct 1, 2015

@danielmewes ah right, I did fail to notice that this is per-table! Other than that, raft_leader sounds like a great idea +1

@mbroadst
Copy link
Contributor Author

mbroadst commented Oct 1, 2015

In the meantime, I've ended up with this reql to watch for leadership changes on a given table (r.db('test').table('changing_thing') in this case). The query currently takes ~350ms on my localhost, can you think of a way to speed it up? Unfortunately it doesn't seem that the logs provide any relevant index for me to key on, and the table id is a string I need to match for.

r.db('test').table('changing_thing').config()
  .do(function(config) {
    return r.db('rethinkdb').table('logs')
      .filter(function(log) {
        return log('message').match(config('id')).and(log('message').match('Raft leader'));
      });
  })
  .changes();

P.S. what penalty do I incur by attaching a change feed to such a slow query?

@mbroadst
Copy link
Contributor Author

mbroadst commented Oct 1, 2015

strangely enough if I change the match line from:

return log('message').match(config('id')).and(log('message').match('Raft leader'));

to

return log('message').match(config('id').add(': This server is Raft leader'));

I get a slightly slower query (but this is all based on Data Explorer, so I imagine the numbers are close enough). The query profile for the results are 44ms server time, and 54ms server time respectively

@danielmewes
Copy link
Member

I think reading the full log table might be what's slow here. It's not parsed in the most efficient way at the moment. You can make it a bit faster by stopping once you've found the first match:

r.db('rethinkdb').table('logs')
      .orderBy(r.desc({index: "id"}))
      .filter(function(log) {
        return log('message').match(config('id')).and(log('message').match('Raft leader'));
      }).limit(1)

I don't think we actually support changefeeds on such queries yet, since it involves fetching data from multiple tables.

You can however get a changefeed on the logs table, and that should be efficient. It's probably easiest if you first retrieve the table ID manually in your client, and then do something like this:

r.db('rethinkdb').table('logs')
  .filter(function(log) {
      return log('message').match(tableId).and(log('message').match('Raft leader'));
    })
  .changes()

@mbroadst
Copy link
Contributor Author

mbroadst commented Oct 1, 2015

@danielmewes thanks for the quick response. It looks like the plain filter on logs takes about 43ms server time (with a cached id), but adding the orderBy and limit reduced that by half. You had a typo with r.desc, so here's the filter for future readers:

r.db('rethinkdb').table('logs')
  .orderBy({ index: r.desc("id") })
  .filter(function(log) {
    return log('message').match(cachedId).and(log('message').match('Raft leader'));
  })
  .limit(1);

@mbroadst
Copy link
Contributor Author

mbroadst commented Oct 2, 2015

@danielmewes so are you saying that even if raft_leader were available on a given table's configuration I wouldn't be able to set up a changefeed for it?

@mbroadst
Copy link
Contributor Author

mbroadst commented Oct 2, 2015

additionally it seems that orderyBy.limit doesn't work on system tables, bummer!

@danielmewes
Copy link
Member

@mbroadst No that's not what I meant. I just meant that since your query uses both the table configuration as well as the logs table, it doesn't work with changefeeds at the moment.

The changefeed query I mentioned above on the logs table works fine for me.
Similarly, once we have the raft_leader in the table configuration, you can listen to changes by simply doing:

r.db('rethinkdb').table('table_status').filter({id: tableId})('raft_leader').changes()

You just need to retrieve the table ID before.

Eventually you'll also be able to use the more efficient

r.table(...).status()('raft_leader').changes()

but this doesn't yet work because our changefeeds on single documents (rather than sequences) don't allow transformations (here: selecting the 'raft_leader' field) yet. We hope to add that functionality soon.

@danielmewes
Copy link
Member

@mbroadst Regarding orderBy.limit on system tables: Could you post the exact query you're running?

Something like the mentioned

r.db('rethinkdb').table('logs')
  .orderBy({ index: r.desc("id") })
  .filter(function(log) {
    return log('message').match(cachedId).and(log('message').match('Raft leader'));
  })
  .limit(1);

works fine for me.

@mbroadst
Copy link
Contributor Author

mbroadst commented Oct 2, 2015

It's changefeed specific, I get this error:

Unhandled rejection ReqlOpFailedError: System tables don't support changefeeds on `.limit()` in:
r.db("rethinkdb").table("logs").orderBy({
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    index: r.desc("id")
    ^^^^^^^^^^^^^^^^^^^
}).filter(function(var_2) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^
    return var_2("message").match("02aa72e5-cfbd-407b-86ff-870461a4c82b").and(var_2("message").match("Raft leader"))
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
}).limit(1).changes()
^^^^^^^^^^^^^^^^^^^^^

@danielmewes
Copy link
Member

Ah yes, that's true for changefeeds. You can just register a changefeed without the orderBy and limit though, and that should give you the updates whenever a new message matching the filter appears in the logs.

What I recommend doing is something like this:

  1. Register a changefeed with the correct filter on the logs table (without any orderBy or limit).
  2. Run the query with the orderBy and limit(1) to get the current raft leader
  3. If you have seen any notifications on the changefeed, use the raft leader from the changefeed. If not, use the one from the manual one-time query.

RethinkDB 2.2 will have an include_initial option that allows you to get both the initial value and a changefeed together. We might be able to simplify this a bit once that option is available.

@mbroadst
Copy link
Contributor Author

mbroadst commented Oct 2, 2015

Yep that's what I've got now! I am definitely looking forward to all these new features though 😄

@mbroadst
Copy link
Contributor Author

mbroadst commented Oct 5, 2015

@danielmewes is there possibly a way to forcibly change the raft leader for a table from e.g. reql? This would be incredibly useful for testing on our side

@danielmewes
Copy link
Member

@mbroadst Right now the only way is to take the server down or to interrupt the network connection (using iptables or similar).

@coffeemug
Copy link
Contributor

We had a proposal for a debug interface to forcibly change the Raft leader, but it hasn't been implemented yet. I think we should eventually expose a system table API for all this (changing the Raft leader, subscribing to changes on it, etc.)

@mbroadst
Copy link
Contributor Author

mbroadst commented Oct 9, 2015

@danielmewes and I spoke on IRC earlier and discussed the possibility of also adding a raft_status alongside raft_leader so someone could theoretically make decisions based on whether e.g. an election is currently in process, or failed, etc.

@VeXocide
Copy link
Member

Merged into next via commit 0f71d4b.

@danielmewes danielmewes modified the milestones: 2.2-polish, 2.2 Oct 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants