Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Guarantee failed: [ts->tv_nsec >= 0 && ts->tv_nsec < (1000LL * (1000LL * 1000LL))] #4931

Closed
mbroadst opened this issue Oct 8, 2015 · 9 comments
Assignees
Labels
Milestone

Comments

@mbroadst
Copy link
Contributor

mbroadst commented Oct 8, 2015

I'm sorry I don't have time to research the cause of this atm, but wanted to put this here for reference. I've just experienced a crash when switching nodes around in a cluster, resulting in the following:

2015-10-08T13:15:44.659193487 42377.176343s error: Error in src/time.cc at line 71:
2015-10-08T13:15:44.659304993 42377.176454s error: Guarantee failed: [ts->tv_nsec >= 0 && ts->tv_nsec < (1000LL * (1000LL * 1000LL))]
2015-10-08T13:15:44.659334099 42377.176483s error: Backtrace:
2015-10-08T13:15:44.994025935 42377.511179s error: Thu Oct  8 13:15:44 2015\n\n1: backtrace_t::backtrace_t() at ??:?\n2: format_backtrace(bool) at ??:?\n3: report_fatal_error(char const*, int, char const*, ...) at ??:?\n4: add_to_timespec(timespec*, int) at ??:?\n5: logs_artificial_table_backend_t::cfeed_machinery_t::run(peer_id_t const&, uuid_u const&, log_server_business_card_t const&, bool, auto_drainer_t::lock_t) at ??:?\n6: callable_action_instance_t<std::_Bind<std::_Mem_fn<void (logs_artificial_table_backend_t::cfeed_machinery_t::*)(peer_id_t const&, uuid_u const&, log_server_business_card_t const&, bool, auto_drainer_t::lock_t)> (logs_artificial_table_backend_t::cfeed_machinery_t*, peer_id_t, uuid_u, log_server_business_card_t, bool, auto_drainer_t::lock_t)> >::run_action() at ??:?\n7: coro_t::run() at ??:?
2015-10-08T13:15:44.994200269 42377.511350s error: Exiting.

Hope that helps

@mbroadst mbroadst changed the title crash in 2.1.4 Guarantee failed: [ts->tv_nsec >= 0 && ts->tv_nsec < (1000LL * (1000LL * 1000LL))] Oct 8, 2015
@mbroadst mbroadst changed the title Guarantee failed: [ts->tv_nsec >= 0 && ts->tv_nsec < (1000LL * (1000LL * 1000LL))] Error: Guarantee failed: [ts->tv_nsec >= 0 && ts->tv_nsec < (1000LL * (1000LL * 1000LL))] Oct 8, 2015
@mbroadst
Copy link
Contributor Author

mbroadst commented Oct 8, 2015

other relevant information:

2015-10-08T01:29:27.763831874 0.280985s notice: Running rethinkdb 2.1.4~0trusty (GCC 4.8.2)...
2015-10-08T01:29:27.769366330 0.286519s notice: Running on Linux 3.16.0-50-generic x86_64

@danielmewes
Copy link
Member

Thanks for reporting @mbroadst . Does this go away when you restart the server or is it a persistent issue?
If it's persistent, you can probably recover by deleting the log file from the rethinkdb data directory on the affected server.

Assigning @VeXocide. This appears to be originating from the logs table calling add_to_timespec with an invalid value.

The backtrace with newlines:

1: backtrace_t::backtrace_t() at ??:?
2: format_backtrace(bool) at ??:?
3: report_fatal_error(char const*, int, char const*, ...) at ??:?
4: add_to_timespec(timespec*, int) at ??:?
5: logs_artificial_table_backend_t::cfeed_machinery_t::run(peer_id_t const&, uuid_u const&, log_server_business_card_t const&, bool, auto_drainer_t::lock_t) at ??:?
6: callable_action_instance_t<std::_Bind<std::_Mem_fn<void (logs_artificial_table_backend_t::cfeed_machinery_t::*)(peer_id_t const&, uuid_u const&, log_server_business_card_t const&, bool, auto_drainer_t::lock_t)> (logs_artificial_table_backend_t::cfeed_machinery_t*, peer_id_t, uuid_u, log_server_business_card_t, bool, auto_drainer_t::lock_t)> >::run_action() at ??:?
7: coro_t::run() at ??:?

@danielmewes danielmewes added this to the 2.1.x milestone Oct 8, 2015
@mbroadst
Copy link
Contributor Author

mbroadst commented Oct 8, 2015

@danielmewes it was not persistent, simply restarting the service resumed normal operation

@mbroadst
Copy link
Contributor Author

@danielmewes this is now happening consistently. In most cases we can just reboot, however we have also run into cases where this crash occurs before replicas are established and then we run into all sorts of trouble.

the bug seems to be related to the infinite loop in logs_artificial_table_backend_t::cfeed_machinery_t::run continuously adding 1 to the min_time timespec. I'm willing to do the work to fix the bug, if someone from your team maybe has a second to discuss the issue on IRC first (to make sure I'm headed in the right direction)?

@danielmewes
Copy link
Member

danielmewes commented Oct 12, 2015 via email

@mbroadst
Copy link
Contributor Author

@danielmewes well it's not a persistent issue so no I haven't tried removing the log files. As it stands we can set in the unit file to respawn on crash, however I've been able to reproduce the follow scenario at least twice:

  • 3 nodes are joined together in a cluster with a primary replica/raft leader

  • data is inserted into a few tables on the first node

  • all tables are reconfigured to have 3 replicas as soon as the third node joins

  • first node goes down (the only one with actual data presently) because of this crash

  • nodes 2 and 3 seem to have received information about the tables but maybe not all the data, so they elect a new leader but no replicas are available for operations

  • node 1 comes back online after a restart, joins the cluster but is in some sort of completely inconsistent state including the fact that it just keeps timing out on a raft election and starting new ones (ad infinitum)

    TL;DR - once this crash happens in roughly those terms, the entire cluster must be brought down because nothing works

@VeXocide
Copy link
Member

In CR 3285 by @danielmewes.

@VeXocide
Copy link
Member

Merged into next and v2.1.x via commits fb9a2b9 and a007331 respectively.

@mbroadst
Copy link
Contributor Author

@VeXocide just confirmed on our side the crash is totally resolved, huzzah!

@danielmewes danielmewes modified the milestones: 2.1.x, 2.2 Nov 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants