Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server crash after upgrading from 2.0.4 to 2.2.0 #5085

Closed
jilen opened this issue Nov 14, 2015 · 30 comments
Closed

Server crash after upgrading from 2.0.4 to 2.2.0 #5085

jilen opened this issue Nov 14, 2015 · 30 comments

Comments

@jilen
Copy link

jilen commented Nov 14, 2015

2015-11-14T12:05:44.374679457 48.354909s error: Error in src/rdb_protocol/datum_stream.cc at line 855:
2015-11-14T12:05:44.374800028 48.355002s error: Guarantee failed: [found_hash_pair] 
2015-11-14T12:05:44.374825615 48.355027s error: Backtrace:
2015-11-14T12:05:44.647105090 48.627311s error: Sat Nov 14 12:05:44 2015\n\n1 [0xacbba0]: backtrace_t::backtrace_t() at 0xacbba0 (/usr/bin/rethinkdb)\n2 [0xacbf00]: format_backtrace(bool) at 0xacbf00 (/usr/bin/rethinkdb)\n3 [0xcdc8d0]: report_fatal_error(char const*, int, char const*, ...) at 0xcdc8d0 (/usr/bin/rethinkdb)\n4 [0x8a607b]: ql::primary_readgen_t::restrict_active_ranges(sorting_t, ql::active_ranges_t*) const at 0x8a607b (/usr/bin/rethinkdb)\n5 [0x8a696b]: ql::rget_response_reader_t::unshard(sorting_t, rget_read_response_t&&) at 0x8a696b (/usr/bin/rethinkdb)\n6 [0x8acc89]: ql::rget_reader_t::do_range_read(ql::env_t*, read_t const&) at 0x8acc89 (/usr/bin/rethinkdb)\n7 [0x8ad0a0]: ql::rget_reader_t::load_items(ql::env_t*, ql::batchspec_t const&) at 0x8ad0a0 (/usr/bin/rethinkdb)\n8 [0x8a1137]: ql::rget_response_reader_t::next_batch(ql::env_t*, ql::batchspec_t const&) at 0x8a1137 (/usr/bin/rethinkdb)\n9 [0x89ee23]: ql::lazy_datum_stream_t::next_batch_impl(ql::env_t*, ql::batchspec_t const&) at 0x89ee23 (/usr/bin/rethinkdb)\n10 [0x89fe27]: ql::datum_stream_t::next_batch(ql::env_t*, ql::batchspec_t const&) at 0x89fe27 (/usr/bin/rethinkdb)\n11 [0x7ce04f]: ql::query_cache_t::ref_t::serve(ql::env_t*, ql::response_t*) at 0x7ce04f (/usr/bin/rethinkdb)\n12 [0x7cef3d]: ql::query_cache_t::ref_t::fill_response(ql::response_t*) at 0x7cef3d (/usr/bin/rethinkdb)\n13 [0x889b83]: rdb_query_server_t::run_query(ql::query_params_t*, ql::response_t*, signal_t*) at 0x889b83 (/usr/bin/rethinkdb)\n14 [0xac7ff7]: _Z14save_exceptionIZZN14query_server_t15connection_loopI15json_protocol_tEEvP16linux_tcp_conn_tmPN2ql13query_cache_tEP8signal_tENKUlvE_clEvEUlvE_EvPNSt15__exception_ptr13exception_ptrEPSsP6cond_tOT_+0x37 at 0xac7ff7 (/usr/bin/rethinkdb)\n15 [0xac85a0]: _ZZN14query_server_t15connection_loopI15json_protocol_tEEvP16linux_tcp_conn_tmPN2ql13query_cache_tEP8signal_tENKUlvE_clEv+0x130 at 0xac85a0 (/usr/bin/rethinkdb)\n16 [0x9f44f8]: coro_t::run() at 0x9f44f8 (/usr/bin/rethinkdb)
@mlucy mlucy added this to the 2.2.x milestone Nov 14, 2015
@mlucy mlucy self-assigned this Nov 14, 2015
@danielmewes
Copy link
Member

Do you know if there was a particular query that triggered this? Does it happen repeatedly?

@jilen
Copy link
Author

jilen commented Nov 14, 2015

@danielmewes Yes it occurs repeatly, But I don't know which query cause this

@mlucy
Copy link
Member

mlucy commented Nov 14, 2015

@jilen --

Thanks for the bug report! Would you mind answering a few more questions to help us track this down?

  • What platform are you running on?
  • Is there anything unusual about your tables, such as outdated secondary indexes or unavailable servers?
  • Are you issuing any get_all queries? If so, what do they look like, roughly?

@jilen
Copy link
Author

jilen commented Nov 14, 2015

@mlucy

  1. CentOS 6.2
  2. Yes, I am just upgrading, and index-rebuild failed. So there are outdated secondary indexes #5083
  3. I am using get_all on the outdated index , such as r.getAll([xxx], {index: xxx}).group().ungroup() and I am also calling between

@mlucy
Copy link
Member

mlucy commented Nov 14, 2015

Ah, sorry, I didn't see #5083 before.

Is it possible that you have an index which hasn't been upgraded since before RethinkDB 1.16? Looking at the code, that's the only thing that I think could cause this. (Although that would be very strange, because we should be refusing to do reads on pre-1.16 indexes in this version. Right @danielmewes?)

Sorry you've run into two problems with the new release! The good news is that hopefully they're related.

@danielmewes
Copy link
Member

Pre 1.16 indexes would fail very differently (on startup), and I think @jilen was running 2.0 before?

Could this happen if some hash shards aren't ready when a read is performed?

@mlucy
Copy link
Member

mlucy commented Nov 14, 2015

@danielmewes -- I don't think so, I think that should produce an error in the unshard visitor and we should never get to the guarantee. We should test it though.

We could also get to that error if the clustering code is in a weird state and silently routes the read to a subset of the shards it needs to go to rather than producing an error because some of the shards it needs to go to are unavailable.

@jilen jilen changed the title Server crash after upgrading to 2.2.0 Server crash after upgrading from 2.0.4 to 2.2.0 Nov 14, 2015
@jilen
Copy link
Author

jilen commented Nov 14, 2015

@danielmewes I am upgrading from 2.0.4

@jilen
Copy link
Author

jilen commented Nov 14, 2015

Still exists after I migrate sencondary indexes

@hueniverse
Copy link

I upgraded my test environment from 2.1.5-2 to 2.2.0 and can easily reproduce crashing the server by running npm test on https://github.com/hueniverse/penseur which is my thin RethinkDB module. Running the tests fails hard on 2.2.0 with the server crashing half way through. I have downgraded back to 2.1.5-2 and it works as expected.

Hope this helps.

@mlucy
Copy link
Member

mlucy commented Nov 14, 2015

Hi @hueniverse !

Thanks, that should help. What platform are you on?

@hueniverse
Copy link

I'm on Ubuntu 14.04.3. If you click on the travis badge you can see the issue as well.

@jilen
Copy link
Author

jilen commented Nov 15, 2015

@hueniverse How could I downgrade the version to 2.1.5-2 on a production server ?

@mlucy
Copy link
Member

mlucy commented Nov 15, 2015

Alright, a fix for this problem is up in CR 3338 by @danielmewes . We should put out 2.2.1 soon with the fix. Thanks for helping us track this down!

@mlucy
Copy link
Member

mlucy commented Nov 15, 2015

@jilen -- you should be able to use the package manager on your system to uninstall RethinkDB 2.2 and install the old package. I've never used CentOS before, so I don't know the exact command.

@danielmewes
Copy link
Member

@jilen The problem with downgrading to 2.1.5 is that you won't be able to use the data files anymore, since RethinkDB 2.2 will have migrated them to a new format.
I'm currently preparing a hot fix package for CentOS 6 with the bug fix for this. I'll post the link here once the build is done.

@danielmewes
Copy link
Member

We'll try to release an official update (2.2.1) with this on Monday or Tuesday.

@danielmewes
Copy link
Member

@jilen You can download the hot fix build for CentOS 6 from here:
https://www.dropbox.com/s/c29pa75to312nda/rethinkdb-2.2.0_10_ge1f7cf.x86_64.rpm?dl=1

@jilen
Copy link
Author

jilen commented Nov 15, 2015

@danielmewes Thanks very much.

@williamstein
Copy link

Is there an Ubuntu hotfix?

@williamstein
Copy link

What is CR 3338? -- "a fix for this problem is up in CR 3338 by @danielmewes" I want to build from source, but don't know what source to build.

@danielmewes
Copy link
Member

The branch to build from ismlucy_5085. @williamstein Do you need an Ubuntu 15.04 package or a different version?

@williamstein
Copy link

mglukhov in slack pointed me to the branch just now too, and I've launched a build from source. It's about half way done. If that works, I'll use it; otherwise, I'll really need an Ubuntu 15.04 package. I'll post back here either way in a few minutes.

@williamstein
Copy link

I successfully built RethinkDB from source on my Ubuntu 15.04 system. I of course also have the RethinkDB official 2.2 system-wide package installed. The result of the build from source seems to be a new rethinkdb binary. Would it be fine for me to just do only: sudo cp build/release/rethinkdb /usr/bin/ and restart rethinkdb. Then, when the 2.2.1 release comes out, i'll switch to that.

@danielmewes
Copy link
Member

@williamstein

Would it be fine for me to just do only: sudo cp build/release/rethinkdb /usr/bin/ and restart rethinkdb.

Yes, that should work. Once the new package comes out, it will automatically overwrite the file.

@williamstein
Copy link

OK, everything seems to be working perfectly now. I'll report any other problems I see. Incidentally, the site and web servers worked fine with RethinkDB servers randomly failing every 30 minutes, e.g., the clients properly recovered in all cases. So this bug was also a good test of autofailover.

@mlucy
Copy link
Member

mlucy commented Nov 16, 2015

This fix is in next and 2.2.x.

@mlucy mlucy closed this as completed Nov 16, 2015
@danielmewes
Copy link
Member

@jilen @williamstein @hueniverse Packages for RethinkDB 2.2.1 are now available with this fix.

@hueniverse
Copy link

Confirmed. Thanks.

@jilen
Copy link
Author

jilen commented Nov 18, 2015

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants