Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash during table fuzzing: Assertion failed: [ptr_] #4871

Closed
Tryneus opened this issue Sep 21, 2015 · 7 comments
Closed

Crash during table fuzzing: Assertion failed: [ptr_] #4871

Tryneus opened this issue Sep 21, 2015 · 7 comments
Labels
Milestone

Comments

@Tryneus
Copy link
Member

Tryneus commented Sep 21, 2015

On next, I ran the table fuzzer like so: table_fuzzer.py --servers 8 --threads 8 --serve-flags "--cache-size 0", and observed this crash:

2015-09-21T15:17:35.833374831 198.702407s error: Error in ../src/containers/scoped.hpp at line 102:
2015-09-21T15:17:35.840732388 198.709765s error: Assertion failed: [ptr_]
2015-09-21T15:17:35.843603375 198.712635s error: Backtrace:
2015-09-21T15:17:36.650718008 199.519752s error: Mon Sep 21 15:17:35 2015

1: rethinkdb_backtrace(void**, int) at rethinkdb_backtrace.cc:101
2: backtrace_t::backtrace_t() at backtrace.cc:203
3: lazy_backtrace_formatter_t::lazy_backtrace_formatter_t() at backtrace.cc:283
4: format_backtrace(bool) at backtrace.cc:198
5: report_fatal_error(char const*, int, char const*, ...) at errors.cc:83
6: scoped_ptr_t<remote_replicator_client_t::timestamp_range_tracker_t>::operator->() const at scoped.hpp:102
7: remote_replicator_client_t::next_write_can_proceed(mutex_assertion_t::acq_t*) at remote_replicator_client.cc:491
8: remote_replicator_client_t::on_write_async(signal_t*, write_t&&, state_timestamp_t, order_token_t, mailbox_addr_t<void ()> const&) at remote_replicator_client.cc:381
9: std::_Mem_fn<void (remote_replicator_client_t::*)(signal_t*, write_t&&, state_timestamp_t, order_token_t, mailbox_addr_t<void ()> const&)>::operator()(remote_replicator_client_t*, signal_t*, write_t&&, state_timestamp_t, order_token_t, mailbox_addr_t<void ()> const&) const at functional:551
10: void std::_Bind<std::_Mem_fn<void (remote_replicator_client_t::*)(signal_t*, write_t&&, state_timestamp_t, order_token_t, mailbox_addr_t<void ()> const&)> (remote_replicator_client_t*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>)>::__call<void, signal_t*&&, write_t&&, state_timestamp_t&&, order_token_t&&, mailbox_addr_t<void ()>&&, 0, 1, 2, 3, 4, 5>(std::tuple<signal_t*&&, write_t&&, state_timestamp_t&&, order_token_t&&, mailbox_addr_t<void ()>&&>&&, std::_Index_tuple<0, 1, 2, 3, 4, 5>) at functional:1146
11: void std::_Bind<std::_Mem_fn<void (remote_replicator_client_t::*)(signal_t*, write_t&&, state_timestamp_t, order_token_t, mailbox_addr_t<void ()> const&)> (remote_replicator_client_t*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>)>::operator()<signal_t*, write_t, state_timestamp_t, order_token_t, mailbox_addr_t<void ()>, void>(signal_t*&&, write_t&&, state_timestamp_t&&, order_token_t&&, mailbox_addr_t<void ()>&&) at functional:1206
12: std::_Function_handler<void (signal_t*, write_t, state_timestamp_t, order_token_t, mailbox_addr_t<void ()>), std::_Bind<std::_Mem_fn<void (remote_replicator_client_t::*)(signal_t*, write_t&&, state_timestamp_t, order_token_t, mailbox_addr_t<void ()> const&)> (remote_replicator_client_t*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>)> >::_M_invoke(std::_Any_data const&, signal_t*, write_t, state_timestamp_t, order_token_t, mailbox_addr_t<void ()>) at functional:1780
13: std::function<void (signal_t*, write_t, state_timestamp_t, order_token_t, mailbox_addr_t<void ()>)>::operator()(signal_t*, write_t, state_timestamp_t, order_token_t, mailbox_addr_t<void ()>) const at functional:2161
14: mailbox_t<void (write_t, state_timestamp_t, order_token_t, mailbox_addr_t<void ()>)>::read_impl_t::read(read_stream_t*, signal_t*) at typed.hpp:414
15: mailbox_manager_t::mailbox_read_coroutine(connectivity_cluster_t::connection_t*, auto_drainer_t::lock_t, threadnum_t, unsigned long, std::vector<char, std::allocator<char> >*, long, mailbox_manager_t::force_yield_t) at mailbox.cc:277
16: /home/ssd2/grey/rethinkdb/build/debug/rethinkdb() [0x1872733] at 0x1872733 ()
17: /home/ssd2/grey/rethinkdb/build/debug/rethinkdb() [0x187326e] at 0x187326e ()
18: callable_action_wrapper_t::run() at runtime_utils.cc:43
19: coro_t::run() at coroutines.cc:207
20: /home/ssd2/grey/rethinkdb/build/debug/rethinkdb() [0x1872c8c] at 0x1872c8c ()
21: mailbox_manager_t::on_message(connectivity_cluster_t::connection_t*, auto_drainer_t::lock_t, read_stream_t*) at mailbox.cc:245
22: connectivity_cluster_t::run_t::handle(keepalive_tcp_conn_stream_t*, boost::optional<peer_id_t>, boost::optional<peer_address_t>, auto_drainer_t::lock_t, bool*) at cluster.cc:1194
23: connectivity_cluster_t::run_t::on_new_connection(scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t) at cluster.cc:294
24: std::_Mem_fn<void (connectivity_cluster_t::run_t::*)(scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t)>::operator()(connectivity_cluster_t::run_t*, scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t) const at functional:551
25: void std::_Bind<std::_Mem_fn<void (connectivity_cluster_t::run_t::*)(scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t)> (connectivity_cluster_t::run_t*, std::_Placeholder<1>, auto_drainer_t::lock_t)>::__call<void, scoped_ptr_t<linux_tcp_conn_descriptor_t>&, 0, 1, 2>(std::tuple<scoped_ptr_t<linux_tcp_conn_descriptor_t>&>&&, std::_Index_tuple<0, 1, 2>) at functional:1146
26: void std::_Bind<std::_Mem_fn<void (connectivity_cluster_t::run_t::*)(scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t)> (connectivity_cluster_t::run_t*, std::_Placeholder<1>, auto_drainer_t::lock_t)>::operator()<scoped_ptr_t<linux_tcp_conn_descriptor_t>&, void>(scoped_ptr_t<linux_tcp_conn_descriptor_t>&) at functional:1206
27: std::_Function_handler<void (scoped_ptr_t<linux_tcp_conn_descriptor_t>&), std::_Bind<std::_Mem_fn<void (connectivity_cluster_t::run_t::*)(scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t)> (connectivity_cluster_t::run_t*, std::_Placeholder<1>, auto_drainer_t::lock_t)> >::_M_invoke(std::_Any_data const&, scoped_ptr_t<linux_tcp_conn_descriptor_t>&) at functional:1780
28: std::function<void (scoped_ptr_t<linux_tcp_conn_descriptor_t>&)>::operator()(scoped_ptr_t<linux_tcp_conn_descriptor_t>&) const at functional:2162
29: linux_nonthrowing_tcp_listener_t::handle(int) at network.cc:924
30: std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(int)>::operator()(linux_nonthrowing_tcp_listener_t*, int) const at functional:551
31: void std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(int)> (linux_nonthrowing_tcp_listener_t*, int)>::__call<void, , 0, 1>(std::tuple<>&&, std::_Index_tuple<0, 1>) at functional:1147
32: void std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(int)> (linux_nonthrowing_tcp_listener_t*, int)>::operator()<, void>() at functional:1206
33: callable_action_instance_t<std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(int)> (linux_nonthrowing_tcp_listener_t*, int)> >::run_action() at callable_action.hpp:31
34: callable_action_wrapper_t::run() at runtime_utils.cc:43
35: coro_t::run() at coroutines.cc:207
...

It appears that the remote_replicator_client_t invariant that tracker_ always exists while mode_ == BACKFILLING is not always true.

@Tryneus Tryneus added the tp:bug label Sep 21, 2015
@Tryneus
Copy link
Member Author

Tryneus commented Sep 21, 2015

I added an initializer for mode_(backfill_mode_t::PAUSED) and moved the tracker_.reset() down to the line mode_ = backfill_mode_t::STREAMING, and I can't reproduce the crash anymore. Unfortunately, this shouldn't make a difference because this code should have a write lock on cleanup_rwlock_, and the backtrace above should have a read lock.

@danielmewes danielmewes added this to the 2.1.x milestone Sep 21, 2015
@danielmewes
Copy link
Member

@Tryneus With those changes, did it crash in the same call to next_write_can_proceed from line 381?
There's a second call site for next_write_can_proceed in line 280 which is not as obviously protected by an rwlock acquisition.

@Tryneus
Copy link
Member Author

Tryneus commented Sep 22, 2015

@danielmewes - with those changes I could not produce any crash.

@danielmewes
Copy link
Member

Ahm sorry I wasn't thinking properly. Ignore my question, it made absolutely no sense.

@danielmewes
Copy link
Member

This turns out to be a problem in debug mode only.

@Tryneus
Copy link
Member Author

Tryneus commented Sep 24, 2015

Fix is up in review 3247.

Tryneus pushed a commit that referenced this issue Sep 24, 2015
Tryneus pushed a commit that referenced this issue Sep 24, 2015
@Tryneus
Copy link
Member Author

Tryneus commented Sep 24, 2015

The fix has been approved and merged to next in commit 92a4f32, and cherry-picked into v2.1.x in commit aefd760. Will be in release 2.1.5.

@Tryneus Tryneus closed this as completed Sep 24, 2015
@danielmewes danielmewes modified the milestones: 2.1.x, 2.1.5 Oct 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants