Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return_initial #3579

Closed
mlucy opened this issue Jan 15, 2015 · 45 comments
Closed

return_initial #3579

mlucy opened this issue Jan 15, 2015 · 45 comments
Assignees
Milestone

Comments

@mlucy
Copy link
Member

mlucy commented Jan 15, 2015

We never implemented a return_initial optarg. Currently point changefeeds and limit changefeeds always do return_initial, and other feeds never do. I think that's probably the desired default behavior.

It would be too much work to support return_initial on range changefeeds before the release, but we could offer the option to disable it on changefeeds where it's enabled by default. @coffeemug, @danielmewes, do you think this is worth doing?

@mlucy mlucy self-assigned this Jan 15, 2015
@mlucy mlucy added this to the 1.16-polish milestone Jan 15, 2015
@danielmewes
Copy link
Member

I wouldn't worry about this for 1.16. Moving to subsequent.

@danielmewes danielmewes modified the milestones: subsequent, 1.16-polish Jan 15, 2015
@coffeemug
Copy link
Contributor

Agree with @danielmewes. I do think that once 1.16 is out, return_initial and/or persistent/restartable feeds will be the next order of business for changefeeds.

@Timer
Copy link

Timer commented Mar 17, 2015

I would be interested in seeing this implemented.

@danielmewes
Copy link
Member

@Timer @mlucy is working on the full version of this (including return_initial for range changefeeds). Will ship in 2.1 (after 2.0).

@Timer
Copy link

Timer commented Mar 18, 2015

Sounds good, thanks for the update @danielmewes!

@mlucy
Copy link
Member Author

mlucy commented May 7, 2015

This is in next, but may not make it into 2.1. When we're sure where we're scheduling it we should open a docs issue. (Re-assigning to @danielmewes to handle scheduling.)

@mlucy mlucy assigned danielmewes and unassigned mlucy May 7, 2015
@Timer
Copy link

Timer commented May 8, 2015

Thanks for the update, @mlucy!

@jasonkuhrt
Copy link

If I understand this issue correctly it will allow the results of a table be gotten initially and watched ala point changefeeds? I would/will use this feature.

-- Edit VeXocide on IRC confirmed the answer to my question is yes.

@larkost
Copy link
Collaborator

larkost commented Jun 3, 2015

I think that all types of changefeeds should either default to returning initial changes, or not. The mixed behavior that we seem to be adopting is confusing. I see the logic in where the defaults are, and especially in having table changefeeds not default to sending the entire table, but I don't think that the convenience of trying to match the defaults to the normal use case is worth having the (potentially) seemingly random differences in behavior between types of changefeeds.

@coffeemug
Copy link
Contributor

I can definitely see where you're coming from, but I disagree with the conclusion (that we should standardize on one default). It feels like:

  • Defaulting to true on queries like r.table('foo').changes() is obviously bad because it makes this query useless by default in 95% of scenarios
  • Defaulting to false on queries like r.table('foo').orderBy(index='bar').limit(5) is obviously bad because it makes this query useless by default in 95% of scenarios
  • Having different defaults in different cases is obviously bad because inconsistency is confusing

Given that we're picking between different evils, I think we should go the pragmatic route (different defaults in different cases, accept the inconsistency) and document it really well. But I can definitely see why people are unhappy with that behavior. I'm unhappy with it too, just less unhappy than the alternatives.

@deontologician
Copy link
Contributor

Defaulting to true on queries like r.table('foo').changes() is obviously bad because it makes this query useless by default in 95% of scenarios

Defaulting to false on queries like r.table('foo').orderBy(index='bar').limit(5) is obviously bad because it makes this query useless by default in 95% of scenarios

I think there's a difference though because the first is bad because you'll accidentally suck down your entire table. The second one is bad because it's a bit annoying to need to tack on "return_initial" when you almost always need to for that kind of query.

But! That's only because we support changefeeds on a limited subset of queries right now, and mentally we're translating r.table('foo').orderBy(index='bar').limit(5) into "top five list". If you think about a hypothetical (and hopefully real) future where changefeeds work on everything, suddenly there are all sorts of combinations and it's a lot less obvious that some type of query has a natural default for return_initial. (In other words, assuming it really is 95% for one query and 95% for the other, in the future it's very likely we'll need to make almost arbitrary decisions because there is no easy way to map a chain of terms to a developer's intent).

If you still aren't convinced, think about the costs of the second in terms of annoyance to the user. They're going to reason out naturally "Ok, I want to get these users, and order them... then I guess I'll take the top 3. Ok changefeed that." then a bit later "Oh whoops, I need the initial values. Do I need to do a separate query to get the initial values? [reads docs a bit] Ah, neat, there's an optarg that will give them to me".

My contention being that instead of looking up in a big list of changefeedable queries "Which one gets me a leaderboard?", they're likely to reason it out for themselves, and in that case, they'll appreciate the regularity of return_initial s behavior

@coffeemug
Copy link
Contributor

That's a pretty compelling argument to turn it off by default.

@danielmewes
Copy link
Member

That's a pretty compelling argument to turn it off by default.

I think I agree with that. ...Unless there's some very clear and generic distinction between the two cases that I'm not seeing.

@jasonkuhrt
Copy link

That's a pretty compelling argument to turn it off by default.

I agree. The consistency is actually a big win here. Furthermore this problem could still be solved via root level defaults configuration (e.g. let users customize/toggle what the defaults are etc.) if we really wanted to.

@mlucy
Copy link
Member Author

mlucy commented Jun 6, 2015

Alright, let's turn it off by default for everything in 2.2.

We may want to pick a shorter name. The current name is include_initial_vals in the branch, which is a pain in the ass to have to type all the time. But on the other hand include_initial_vals is nice because it's consistent with include_states.

@jasonkuhrt
Copy link

include_initial?

@coffeemug
Copy link
Contributor

Shouldn't this be include_initial_vals for consistency? I'm fine with either choice -- @mlucy could you just settle on one?

@mlucy
Copy link
Member Author

mlucy commented Jun 16, 2015

Sorry to drag on the discussion for forever, but what do people think about a really short one like init? So r.table('test').changes(init: true) instead of r.table('test').changes(include_initial_vals: true).

If people don't like a short one, I think include_initial_vals is the best of the long ones.

@deontologician
Copy link
Contributor

I prefer short names in general, but init sounds like we're initializing something. I think include_initial is a good middle ground. I don't think _vals adds a lot and does make it more visually noisy

@coffeemug
Copy link
Contributor

I agree with @deontologician.

@VeXocide
Copy link
Member

I agree with @deontologician as well.

@mlucy
Copy link
Member Author

mlucy commented Jun 16, 2015

Alright, include_initial works for me too. Let's re-settle this issue with include_initial as the name, and have it always default to false.

@coffeemug
Copy link
Contributor

👏

@danielmewes
Copy link
Member

Sounds great

@jasonkuhrt
Copy link

👍

@Slava
Copy link

Slava commented Jun 30, 2015

commenting to register my interest in this issue publicly, as well as subscribe for further updates

@ccorcos
Copy link

ccorcos commented Jul 1, 2015

+1

@danielcompton
Copy link
Contributor

When this feature is added, will it dissolve the differences between 'point' changefeeds and other changefeeds?

@mlucy
Copy link
Member Author

mlucy commented Aug 11, 2015

@danielcompton -- yes, the interface should be the same.

@stellanhaglund
Copy link

This seems very promising, I guess the 2.2 release is months away?

@mlucy
Copy link
Member Author

mlucy commented Sep 7, 2015

@stellanhaglund -- we don't have a release date yet, but we just released 2.1 and historically releases have been 1-3 months apart.

@neonrust
Copy link

Question: would the initial state stream be possible to sort?
In my system I want to return the documents in the same order they arrived, so for the consumer it will be received as if it was live (just faster!).
Currently I do a current-value query (sorted by modify time) and a changes() query in a thread each and serialize the output from those threads so they're emitted externally in "initial-then-changes"-order. This is a somewhat cumbersome solution (and notably slower). Having changes() natively return initial state first is a great win for this use case. However, there is the sorting issue...

@danielmewes
Copy link
Member

@tatsujin1 If you just do an orderBy({index: ...}).changes({includeInitial: true}), changes will not necessarily arrive after all initial results, but you can start seeing some changes before you have received all initial results.

There are two work-arounds:

  • If you also specify {includeStates: true}, you will receive a special document {state: "ready"} once all initial results (plus possibly a few changes) have been transmitted. If you delay rendering in your application until you see that document, you get essentially the desired behavior where the first view that's rendered represents the initial results at the time of the ready document, and from that point on you receive changes to it. Note that at no point you will see a change for a document that you haven't seen the initial result for yet.
  • If you attach a limit to the query, i.e. table.orderBy({index: ...}).limit(...), I believe you will get the behavior you describe where all initial results will be transmitted first, followed by any updates.

The reason for why unlimited changefeeds with include_initial can send changes before they've sent all initial results is for memory consumption. If we had to somehow keep track of all changes until we're done sending all initial results, that would mean that we would need to use an unbounded amount of memory (depending on the number of changes happening during the time where we stream the initial results).

@neonrust
Copy link

Thanks for the thorough response!

That changes arrive amongst initial state documents is not a problem; the two sets are simple to serialize in application code. Hmm... come to think of it, I'm not sure that is strictly required. :) (especially as the initial is sent first, as you mentioned)

Sorting the initial state is something I need, however; the inter-document arrival order has meaning in my case, e.g. A then B means X; B then A means Y.

But hey, I hadn't thought of using an indexed .order_by()! As it is required to be first in the query chain I thought it would be too slow to "sort the entire data set", but I tried it now and it is surprisingly fast! :)

Yeah, I know of the state documents, but since I'm doing the initial+changes queries manually it's not currently used, but they of course will be once this feature is in use. If the initial+changes sets need to be serialized that is...

@mlucy
Copy link
Member Author

mlucy commented Oct 20, 2015

This is in next, but it still needs more testing (I'll create another issue for that shortly). Should be released in 2.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests