InnoDB: Error: semaphore wait has lasted > 600 seconds. InnoDB: We intentionally crash the server, because it appears to be hung. 2013-10-07 16:29:16 7fc3ebfff700 InnoDB: Assertion failure in thread <nr> in file srv0srv.cc line 2124 (II)

Bug #1236696 reported by Roel Van de Paar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
High
Laurynas Biveinis
5.1
Invalid
Undecided
Unassigned
5.5
Invalid
Undecided
Unassigned
5.6
Fix Released
High
Laurynas Biveinis

Bug Description

Thread 1 (LWP 15538):
+bt
#0 0x000000363720c69c in pthread_kill () from /lib64/libpthread.so.0
#1 0x0000000000687d8e in handle_fatal_signal (sig=6) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/sql/signal_handler.cc:254
#2 <signal handler called>
#3 0x0000003636e328a5 in raise () from /lib64/libc.so.6
#4 0x0000003636e34085 in abort () from /lib64/libc.so.6
#5 0x0000000000b58c9d in srv_error_monitor_thread (arg=<optimized out>) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/srv/srv0srv.cc:2124
#6 0x0000003637207851 in start_thread () from /lib64/libpthread.so.0
#7 0x0000003636ee890d in clone () from /lib64/libc.so.6

===========

Version: '5.6.13-rc60.6-debug-log' socket: '/sde//024793/current1_4/tmp/master.sock' port: 13060 Percona Server with XtraDB (GPL), Release rc60.6, Revision 462-debug
2013-10-07 16:11:56 15516 [Warning] InnoDB: changed page bitmap file './ib_modified_log_1_0.xdb' does not contain a complete run at the end.
2013-10-07 16:13:06 15516 [Warning] InnoDB: changed page bitmap file './ib_modified_log_1_0.xdb' does not contain a complete run at the end.
InnoDB: Warning: a long semaphore wait:
--Thread 140479423301376 has waited at srv0srv.cc line 2589 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0x1562560 '&dict_operation_lock'
a writer (thread id 140479530440448) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 720
Last time write locked in file /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/handler/ha_innodb.cc line 10034
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0

=====================================
2013-10-07 16:17:43 7fc3eabfd700 INNODB MONITOR OUTPUT
=====================================
[...]
InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.
2013-10-07 16:29:16 7fc3ebfff700 InnoDB: Assertion failure in thread 140479454770944 in file srv0srv.cc line 2124

Tags: qa ahi

Related branches

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :

<laurynas> unless further debugging changes something, it happens if AHI is enabled and some of thread priorities (which are experimental) are set

Revision history for this message
Roel Van de Paar (roel11) wrote :
tags: added: ahi
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :
Download full text (9.1 KiB)

The interesting thread is 37:

Thread 37 (LWP 15806):
+bt
#0 0x000000363720b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000ac4143 in os_cond_wait (fast_mutex=0x27e6c30, cond=0x27e6c70) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/os/os0sync.cc:214
#2 os_event_wait_low (event=0x27e6c30, reset_sig_count=264) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/os/os0sync.cc:610
#3 0x0000000000b63601 in sync_array_wait_event (arr=0x20e3460, index=0) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/sync/sync0arr.cc:433
#4 0x0000000000b653a6 in rw_lock_s_lock_spin (_lock=0x27e6a38, pass=0, priority_lock=true, high_priority=false, file_name=0xe1eb90 "/bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/btr/btr0sea.cc", line=954) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/sync/sync0rw.cc:537
#5 0x0000000000bc60ae in pfs_rw_lock_s_lock_func (line=954, file_name=0xe1eb90 "/bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/btr/btr0sea.cc", lock=0x27e6a38, pass=<optimized out>) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/include/sync0rw.ic:1090
#6 btr_search_guess_on_hash (index=0x280c2b8, info=0x280c670, tuple=0x7fc2e40595d8, mode=4, latch_mode=2, cursor=0x7fc3f08230b0, has_search_latch=0, mtr=0x7fc3f08228c0) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/btr/btr0sea.cc:954
#7 0x0000000000bb7680 in btr_cur_search_to_nth_level (index=0x280c2b8, level=0, tuple=0x7fc2e40595d8, mode=4, latch_mode=2, cursor=0x7fc3f08230b0, has_search_latch=0, file=0xe04590 "/bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/row/row0ins.cc", line=2370, mtr=0x7fc3f08228c0) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/btr/btr0cur.cc:539
#8 0x0000000000b09b8e in row_ins_clust_index_entry_low (flags=0, mode=2, index=0x280c2b8, n_uniq=2, entry=0x7fc2e40595d8, n_ext=0, thr=0x7fc2e4050370) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/row/row0ins.cc:2370
#9 0x0000000000b0e825 in row_ins_clust_index_entry (index=0x280c2b8, entry=0x7fc2e40595d8, thr=0x7fc2e4050370, n_ext=0) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/row/row0ins.cc:2909
#10 0x0000000000b0ee07 in row_ins_index_entry (thr=0x7fc2e4050370, entry=0x7fc2e40595d8, index=0x280c2b8) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/row/row0ins.cc:3003
#11 row_ins_index_entry_step (thr=0x7fc2e4050370, node=0x7fc2e40d3d20) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/row/row0ins.cc:3080
#12 row_ins (thr=0x7fc2e4050370, node=0x7fc2e40d3d20) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/row/row0ins.cc:3220
#13 row_ins_step (thr=0x7fc2e4050370) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/row/row0ins.cc:3345
#14 0x0000000000aec6c8 in que_thr_step (thr=0x7fc2e4050370) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/que/que0que.cc:1095
#15 que_run_threads_low (thr=0x7fc2e4050370) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/que/que0que.cc:1177
#16 0x0000000000aece50 in que_run_threads (thr=0x7fc2e4050370) at /bzr/5.6_dbg/Percona-Serve...

Read more...

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Lock word and high_priority_wait_ex_waiter values are incompatible. This might have have happened if rw_lock_x_lock_wait() succeeds in its last locking attempt, after having the set flag already. As there can be only wait wait_ex waiter, it is safe to reset the flag if locking succeeded there.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

    Race conditions between successful locks after having set the
    waiters flag and unlocks exist. If a lock (both priority mutex or
    priority RW lock not as a next-waiter) is successfully locked this
    way, the corresponding waiter flag is set even though other
    waiters on the corresponding event do not necessarily exist.
    Then, on unlock, this higher-priority event with no waiters might
    be signaled while a lower-priority event with waiters will not
    be signaled.

    This is fixable by adjusting priority mutex high_priority_waiters
    and rw lock high_priority_x_waiters and high_priority_s_waiters
    flags to be atomically incremented instead of setting to 1 and
    atomically decremented instead of doing nothing if locking
    succeeds after setting the flag. high_priority_wait_ex_waiter
    flag can be maintained as 0/1 as described in the previous
    comment.

    The already-existing mutex::waiters and lock::waiters flag need
    not to be adjusted this way as there are no lower-priority events
    below them. Although such adjustment would result in a bit less
    spurious wakeup events under load and might be done as needed in
    the future.

summary: InnoDB: Error: semaphore wait has lasted > 600 seconds. InnoDB: We
intentionally crash the server, because it appears to be hung.
2013-10-07 16:29:16 7fc3ebfff700 InnoDB: Assertion failure in thread
- <nr> in file srv0srv.cc line 2124
+ <nr> in file srv0srv.cc line 2124 (II)
Revision history for this message
Roel Van de Paar (roel11) wrote :

Re-testing with patch in place (on revid 5.6.13-rc60.6-462-debug) shows bug 1239062 (III)

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-735

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.