Crash during table fuzzing: `Assertion failed: [ptr_]` #4871

Tryneus · 2015-09-21T22:42:07Z

On next, I ran the table fuzzer like so: table_fuzzer.py --servers 8 --threads 8 --serve-flags "--cache-size 0", and observed this crash:

2015-09-21T15:17:35.833374831 198.702407s error: Error in ../src/containers/scoped.hpp at line 102:
2015-09-21T15:17:35.840732388 198.709765s error: Assertion failed: [ptr_]
2015-09-21T15:17:35.843603375 198.712635s error: Backtrace:
2015-09-21T15:17:36.650718008 199.519752s error: Mon Sep 21 15:17:35 2015

1: rethinkdb_backtrace(void**, int) at rethinkdb_backtrace.cc:101
2: backtrace_t::backtrace_t() at backtrace.cc:203
3: lazy_backtrace_formatter_t::lazy_backtrace_formatter_t() at backtrace.cc:283
4: format_backtrace(bool) at backtrace.cc:198
5: report_fatal_error(char const*, int, char const*, ...) at errors.cc:83
6: scoped_ptr_t<remote_replicator_client_t::timestamp_range_tracker_t>::operator->() const at scoped.hpp:102
7: remote_replicator_client_t::next_write_can_proceed(mutex_assertion_t::acq_t*) at remote_replicator_client.cc:491
8: remote_replicator_client_t::on_write_async(signal_t*, write_t&&, state_timestamp_t, order_token_t, mailbox_addr_t<void ()> const&) at remote_replicator_client.cc:381
9: std::_Mem_fn<void (remote_replicator_client_t::*)(signal_t*, write_t&&, state_timestamp_t, order_token_t, mailbox_addr_t<void ()> const&)>::operator()(remote_replicator_client_t*, signal_t*, write_t&&, state_timestamp_t, order_token_t, mailbox_addr_t<void ()> const&) const at functional:551
10: void std::_Bind<std::_Mem_fn<void (remote_replicator_client_t::*)(signal_t*, write_t&&, state_timestamp_t, order_token_t, mailbox_addr_t<void ()> const&)> (remote_replicator_client_t*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>)>::__call<void, signal_t*&&, write_t&&, state_timestamp_t&&, order_token_t&&, mailbox_addr_t<void ()>&&, 0, 1, 2, 3, 4, 5>(std::tuple<signal_t*&&, write_t&&, state_timestamp_t&&, order_token_t&&, mailbox_addr_t<void ()>&&>&&, std::_Index_tuple<0, 1, 2, 3, 4, 5>) at functional:1146
11: void std::_Bind<std::_Mem_fn<void (remote_replicator_client_t::*)(signal_t*, write_t&&, state_timestamp_t, order_token_t, mailbox_addr_t<void ()> const&)> (remote_replicator_client_t*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>)>::operator()<signal_t*, write_t, state_timestamp_t, order_token_t, mailbox_addr_t<void ()>, void>(signal_t*&&, write_t&&, state_timestamp_t&&, order_token_t&&, mailbox_addr_t<void ()>&&) at functional:1206
12: std::_Function_handler<void (signal_t*, write_t, state_timestamp_t, order_token_t, mailbox_addr_t<void ()>), std::_Bind<std::_Mem_fn<void (remote_replicator_client_t::*)(signal_t*, write_t&&, state_timestamp_t, order_token_t, mailbox_addr_t<void ()> const&)> (remote_replicator_client_t*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>)> >::_M_invoke(std::_Any_data const&, signal_t*, write_t, state_timestamp_t, order_token_t, mailbox_addr_t<void ()>) at functional:1780
13: std::function<void (signal_t*, write_t, state_timestamp_t, order_token_t, mailbox_addr_t<void ()>)>::operator()(signal_t*, write_t, state_timestamp_t, order_token_t, mailbox_addr_t<void ()>) const at functional:2161
14: mailbox_t<void (write_t, state_timestamp_t, order_token_t, mailbox_addr_t<void ()>)>::read_impl_t::read(read_stream_t*, signal_t*) at typed.hpp:414
15: mailbox_manager_t::mailbox_read_coroutine(connectivity_cluster_t::connection_t*, auto_drainer_t::lock_t, threadnum_t, unsigned long, std::vector<char, std::allocator<char> >*, long, mailbox_manager_t::force_yield_t) at mailbox.cc:277
16: /home/ssd2/grey/rethinkdb/build/debug/rethinkdb() [0x1872733] at 0x1872733 ()
17: /home/ssd2/grey/rethinkdb/build/debug/rethinkdb() [0x187326e] at 0x187326e ()
18: callable_action_wrapper_t::run() at runtime_utils.cc:43
19: coro_t::run() at coroutines.cc:207
20: /home/ssd2/grey/rethinkdb/build/debug/rethinkdb() [0x1872c8c] at 0x1872c8c ()
21: mailbox_manager_t::on_message(connectivity_cluster_t::connection_t*, auto_drainer_t::lock_t, read_stream_t*) at mailbox.cc:245
22: connectivity_cluster_t::run_t::handle(keepalive_tcp_conn_stream_t*, boost::optional<peer_id_t>, boost::optional<peer_address_t>, auto_drainer_t::lock_t, bool*) at cluster.cc:1194
23: connectivity_cluster_t::run_t::on_new_connection(scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t) at cluster.cc:294
24: std::_Mem_fn<void (connectivity_cluster_t::run_t::*)(scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t)>::operator()(connectivity_cluster_t::run_t*, scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t) const at functional:551
25: void std::_Bind<std::_Mem_fn<void (connectivity_cluster_t::run_t::*)(scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t)> (connectivity_cluster_t::run_t*, std::_Placeholder<1>, auto_drainer_t::lock_t)>::__call<void, scoped_ptr_t<linux_tcp_conn_descriptor_t>&, 0, 1, 2>(std::tuple<scoped_ptr_t<linux_tcp_conn_descriptor_t>&>&&, std::_Index_tuple<0, 1, 2>) at functional:1146
26: void std::_Bind<std::_Mem_fn<void (connectivity_cluster_t::run_t::*)(scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t)> (connectivity_cluster_t::run_t*, std::_Placeholder<1>, auto_drainer_t::lock_t)>::operator()<scoped_ptr_t<linux_tcp_conn_descriptor_t>&, void>(scoped_ptr_t<linux_tcp_conn_descriptor_t>&) at functional:1206
27: std::_Function_handler<void (scoped_ptr_t<linux_tcp_conn_descriptor_t>&), std::_Bind<std::_Mem_fn<void (connectivity_cluster_t::run_t::*)(scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t)> (connectivity_cluster_t::run_t*, std::_Placeholder<1>, auto_drainer_t::lock_t)> >::_M_invoke(std::_Any_data const&, scoped_ptr_t<linux_tcp_conn_descriptor_t>&) at functional:1780
28: std::function<void (scoped_ptr_t<linux_tcp_conn_descriptor_t>&)>::operator()(scoped_ptr_t<linux_tcp_conn_descriptor_t>&) const at functional:2162
29: linux_nonthrowing_tcp_listener_t::handle(int) at network.cc:924
30: std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(int)>::operator()(linux_nonthrowing_tcp_listener_t*, int) const at functional:551
31: void std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(int)> (linux_nonthrowing_tcp_listener_t*, int)>::__call<void, , 0, 1>(std::tuple<>&&, std::_Index_tuple<0, 1>) at functional:1147
32: void std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(int)> (linux_nonthrowing_tcp_listener_t*, int)>::operator()<, void>() at functional:1206
33: callable_action_instance_t<std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(int)> (linux_nonthrowing_tcp_listener_t*, int)> >::run_action() at callable_action.hpp:31
34: callable_action_wrapper_t::run() at runtime_utils.cc:43
35: coro_t::run() at coroutines.cc:207
...

It appears that the remote_replicator_client_t invariant that tracker_ always exists while mode_ == BACKFILLING is not always true.

The text was updated successfully, but these errors were encountered:

Tryneus · 2015-09-21T22:44:41Z

I added an initializer for mode_(backfill_mode_t::PAUSED) and moved the tracker_.reset() down to the line mode_ = backfill_mode_t::STREAMING, and I can't reproduce the crash anymore. Unfortunately, this shouldn't make a difference because this code should have a write lock on cleanup_rwlock_, and the backtrace above should have a read lock.

danielmewes · 2015-09-22T00:51:43Z

@Tryneus With those changes, did it crash in the same call to next_write_can_proceed from line 381?
There's a second call site for next_write_can_proceed in line 280 which is not as obviously protected by an rwlock acquisition.

Tryneus · 2015-09-22T01:04:29Z

@danielmewes - with those changes I could not produce any crash.

danielmewes · 2015-09-22T01:07:59Z

Ahm sorry I wasn't thinking properly. Ignore my question, it made absolutely no sense.

danielmewes · 2015-09-22T22:41:53Z

This turns out to be a problem in debug mode only.

Tryneus · 2015-09-24T00:06:12Z

Fix is up in review 3247.

See github issue #4871

Tryneus · 2015-09-24T00:50:26Z

The fix has been approved and merged to next in commit 92a4f32, and cherry-picked into v2.1.x in commit aefd760. Will be in release 2.1.5.

Tryneus added the tp:bug label Sep 21, 2015

danielmewes added the pr:high label Sep 21, 2015

danielmewes added this to the 2.1.x milestone Sep 21, 2015

danielmewes removed the pr:high label Sep 22, 2015

Tryneus pushed a commit that referenced this issue Sep 24, 2015

Fix debug crash due to interrupting a backfill

92a4f32

See github issue #4871

Tryneus pushed a commit that referenced this issue Sep 24, 2015

Fix debug crash due to interrupting a backfill

aefd760

See github issue #4871

Tryneus closed this as completed Sep 24, 2015

danielmewes modified the milestones: 2.1.x, 2.1.5 Oct 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash during table fuzzing: `Assertion failed: [ptr_]` #4871

Crash during table fuzzing: `Assertion failed: [ptr_]` #4871

Tryneus commented Sep 21, 2015

Tryneus commented Sep 21, 2015

danielmewes commented Sep 22, 2015

Tryneus commented Sep 22, 2015

danielmewes commented Sep 22, 2015

danielmewes commented Sep 22, 2015

Tryneus commented Sep 24, 2015

Tryneus commented Sep 24, 2015

Crash during table fuzzing: Assertion failed: [ptr_] #4871

Crash during table fuzzing: Assertion failed: [ptr_] #4871

Comments

Tryneus commented Sep 21, 2015

Tryneus commented Sep 21, 2015

danielmewes commented Sep 22, 2015

Tryneus commented Sep 22, 2015

danielmewes commented Sep 22, 2015

danielmewes commented Sep 22, 2015

Tryneus commented Sep 24, 2015

Tryneus commented Sep 24, 2015

Crash during table fuzzing: `Assertion failed: [ptr_]` #4871

Crash during table fuzzing: `Assertion failed: [ptr_]` #4871