Actions
Bug #5488
closedlibrbd: deadlock in image refresh
Status:
Resolved
Priority:
High
Assignee:
Jason Dillaman
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
hammer,firefly
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
(gdb) thr app all bt Thread 26 (Thread 0x7f3a1491f700 (LWP 29993)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f3a1668d59b in ceph::log::Log::entry (this=0x830030) at log/Log.cc:323 #2 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a1491f700) at pthread_create.c:308 #3 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 25 (Thread 0x7f3a1411e700 (LWP 29994)): #0 sem_timedwait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:102 #1 0x00007f3a1677c508 in CephContextServiceThread::entry (this=0x82eeb0) at common/ceph_context.cc:57 #2 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a1411e700) at pthread_create.c:308 #3 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 24 (Thread 0x7f3a1391d700 (LWP 29995)): #0 0x00007f3a15e4c313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f3a1660ebdb in AdminSocket::entry (this=0x830580) at common/admin_socket.cc:230 #2 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a1391d700) at pthread_create.c:308 #3 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 23 (Thread 0x7f3a1311c700 (LWP 29996)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f3a166bbbf8 in Wait (mutex=..., this=0x8370f0) at ./common/Cond.h:55 #2 DispatchQueue::entry (this=0x837088) at msg/DispatchQueue.cc:137 #3 0x00007f3a1672581d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/DispatchQueue.h:104 #4 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a1311c700) at pthread_create.c:308 #5 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 22 (Thread 0x7f3a1291b700 (LWP 29997)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f3a16720e09 in Wait (mutex=..., this=0x837480) at ./common/Cond.h:55 #2 SimpleMessenger::reaper_entry (this=0x836fa0) at msg/SimpleMessenger.cc:205 #3 0x00007f3a16725fad in SimpleMessenger::ReaperThread::entry (this=<optimized out>) at msg/SimpleMessenger.h:422 #4 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a1291b700) at pthread_create.c:308 #5 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 21 (Thread 0x7f3a1211a700 (LWP 29998)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x00007f3a1662743b in WaitUntil (when=..., mutex=..., this=0x832998) at common/Cond.h:71 #2 SafeTimer::timer_thread (this=0x832988) at common/Timer.cc:114 #3 0x00007f3a16627ded in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38 #4 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a1211a700) at pthread_create.c:308 #5 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 20 (Thread 0x7f3a11919700 (LWP 29999)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f3a16628be0 in Wait (mutex=..., this=0x832ab0) at ./common/Cond.h:55 #2 Finisher::finisher_thread_entry (this=0x832a50) at common/Finisher.cc:80 #3 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a11919700) at pthread_create.c:308 #4 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () Thread 19 (Thread 0x7f3a11017700 (LWP 30002)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x00007f3a1662743b in WaitUntil (when=..., mutex=..., this=0x832dd8) at common/Cond.h:71 #2 SafeTimer::timer_thread (this=0x832dc8) at common/Timer.cc:114 #3 0x00007f3a16627ded in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38 #4 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a11017700) at pthread_create.c:308 #5 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 18 (Thread 0x7f3a10816700 (LWP 30003)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f3a16628be0 in Wait (mutex=..., this=0x832f18) at ./common/Cond.h:55 #2 Finisher::finisher_thread_entry (this=0x832eb8) at common/Finisher.cc:80 #3 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a10816700) at pthread_create.c:308 #4 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () Thread 17 (Thread 0x7f3a039f9700 (LWP 32628)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f3a16628be0 in Wait (mutex=..., this=0x84b6b0) at ./common/Cond.h:55 #2 Finisher::finisher_thread_entry (this=0x84b650) at common/Finisher.cc:80 #3 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a039f9700) at pthread_create.c:308 #4 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () Thread 16 (Thread 0x7f3a031f8700 (LWP 32629)): #0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:85 #1 0x00007f3a172b309b in get_read (this=0x87bb80) at ./common/RWLock.h:51 #2 librbd::LibrbdWriteback::write (this=0x84d6d0, oid=..., oloc=..., off=1503261, len=<optimized out>, snapc=..., bl=..., mtime=..., trunc_size=0, trunc_seq=0, oncommit=0x7f39f40008c0) at librbd/LibrbdWriteback.cc:143 #3 0x00007f3a172baa56 in ObjectCacher::bh_write (this=0x84b420, bh=0x863ce0) at osdc/ObjectCacher.cc:797 ---Type <return> to continue, or q <return> to quit--- #4 0x00007f3a172bbc80 in ObjectCacher::flusher_entry (this=0x84b420) at osdc/ObjectCacher.cc:1434 #5 0x00007f3a172cc0bd in ObjectCacher::FlusherThread::entry (this=<optimized out>) at osdc/ObjectCacher.h:353 #6 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a031f8700) at pthread_create.c:308 #7 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 15 (Thread 0x7f3a024f2700 (LWP 32630)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f3a16628be0 in Wait (mutex=..., this=0x8a8760) at ./common/Cond.h:55 #2 Finisher::finisher_thread_entry (this=0x8a8700) at common/Finisher.cc:80 #3 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a024f2700) at pthread_create.c:308 #4 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () Thread 14 (Thread 0x7f3a01cf1700 (LWP 32631)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x00007f3a172bb809 in WaitUntil (when=..., mutex=..., this=0x8a86a0) at ./common/Cond.h:71 #2 WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x8a86a0) at ./common/Cond.h:79 #3 ObjectCacher::flusher_entry (this=0x8a84d0) at osdc/ObjectCacher.cc:1439 #4 0x00007f3a172cc0bd in ObjectCacher::FlusherThread::entry (this=<optimized out>) at osdc/ObjectCacher.h:353 #5 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a01cf1700) at pthread_create.c:308 #6 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 13 (Thread 0x7f3a004ee700 (LWP 939)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f3a16710f8b in Wait (mutex=..., this=0x7f39fc006b88) at ./common/Cond.h:55 #2 Pipe::writer (this=0x7f39fc0069a0) at msg/Pipe.cc:1573 #3 0x00007f3a1671bccd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59 #4 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a004ee700) at pthread_create.c:308 #5 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 12 (Thread 0x7f3a11118700 (LWP 940)): #0 0x00007f3a15e4c313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f3a16704e32 in Pipe::tcp_read_wait (this=<optimized out>) at msg/Pipe.cc:2107 #2 0x00007f3a16705150 in Pipe::tcp_read (this=0x7f39fc0069a0, buf=0x7f3a11117dbf "\377", len=1) at msg/Pipe.cc:2080 #3 0x00007f3a16718461 in Pipe::reader (this=0x7f39fc0069a0) at msg/Pipe.cc:1322 #4 0x00007f3a1671bced in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47 #5 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a11118700) at pthread_create.c:308 #6 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 11 (Thread 0x7f3a03efe700 (LWP 2933)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f3a16710f8b in Wait (mutex=..., this=0x7f39fc043418) at ./common/Cond.h:55 #2 Pipe::writer (this=0x7f39fc043230) at msg/Pipe.cc:1573 #3 0x00007f3a1671bccd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59 #4 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a03efe700) at pthread_create.c:308 #5 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 10 (Thread 0x7f3a03bfb700 (LWP 2934)): #0 0x00007f3a15e4c313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f3a16704e32 in Pipe::tcp_read_wait (this=<optimized out>) at msg/Pipe.cc:2107 #2 0x00007f3a16705150 in Pipe::tcp_read (this=0x7f39fc043230, buf=0x7f3a03bfadbf "\377", len=1) at msg/Pipe.cc:2080 #3 0x00007f3a16718461 in Pipe::reader (this=0x7f39fc043230) at msg/Pipe.cc:1322 #4 0x00007f3a1671bced in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47 #5 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a03bfb700) at pthread_create.c:308 #6 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 9 (Thread 0x7f3a03dfd700 (LWP 3060)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f3a16710f8b in Wait (mutex=..., this=0x7f39fc020b98) at ./common/Cond.h:55 #2 Pipe::writer (this=0x7f39fc0209b0) at msg/Pipe.cc:1573 #3 0x00007f3a1671bccd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59 #4 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a03dfd700) at pthread_create.c:308 #5 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 8 (Thread 0x7f3a03afa700 (LWP 3061)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f3a16710f8b in Wait (mutex=..., this=0x7f39fc01fad8) at ./common/Cond.h:55 #2 Pipe::writer (this=0x7f39fc01f8f0) at msg/Pipe.cc:1573 #3 0x00007f3a1671bccd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59 #4 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a03afa700) at pthread_create.c:308 #5 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 7 (Thread 0x7f3a027f5700 (LWP 3062)): #0 0x00007f3a15e4c313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f3a16704e32 in Pipe::tcp_read_wait (this=<optimized out>) at msg/Pipe.cc:2107 #2 0x00007f3a16705150 in Pipe::tcp_read (this=0x7f39fc0209b0, buf=0x7f3a027f4dbf "\377", len=1) at msg/Pipe.cc:2080 #3 0x00007f3a16718461 in Pipe::reader (this=0x7f39fc0209b0) at msg/Pipe.cc:1322 #4 0x00007f3a1671bced in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47 #5 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a027f5700) at pthread_create.c:308 ---Type <return> to continue, or q <return> to quit--- #6 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 6 (Thread 0x7f3a176fe700 (LWP 3063)): #0 0x00007f3a15e4c313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f3a16704e32 in Pipe::tcp_read_wait (this=<optimized out>) at msg/Pipe.cc:2107 #2 0x00007f3a16705150 in Pipe::tcp_read (this=0x7f39fc01f8f0, buf=0x7f3a176fddbf "\377", len=1) at msg/Pipe.cc:2080 #3 0x00007f3a16718461 in Pipe::reader (this=0x7f39fc01f8f0) at msg/Pipe.cc:1322 #4 0x00007f3a1671bced in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47 #5 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a176fe700) at pthread_create.c:308 #6 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 5 (Thread 0x7f3a028f6700 (LWP 3064)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f3a16710f8b in Wait (mutex=..., this=0x7f39fc0089b8) at ./common/Cond.h:55 #2 Pipe::writer (this=0x7f39fc0087d0) at msg/Pipe.cc:1573 #3 0x00007f3a1671bccd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59 #4 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a028f6700) at pthread_create.c:308 #5 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 4 (Thread 0x7f3a03fff700 (LWP 3065)): #0 0x00007f3a15e4c313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f3a16704e32 in Pipe::tcp_read_wait (this=<optimized out>) at msg/Pipe.cc:2107 #2 0x00007f3a16705150 in Pipe::tcp_read (this=0x7f39fc0087d0, buf=0x7f3a03ffedbf "\377", len=1) at msg/Pipe.cc:2080 #3 0x00007f3a16718461 in Pipe::reader (this=0x7f39fc0087d0) at msg/Pipe.cc:1322 #4 0x00007f3a1671bced in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47 #5 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a03fff700) at pthread_create.c:308 #6 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 3 (Thread 0x7f3a029f7700 (LWP 3066)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f3a16710f8b in Wait (mutex=..., this=0x7f39fc002298) at ./common/Cond.h:55 #2 Pipe::writer (this=0x7f39fc0020b0) at msg/Pipe.cc:1573 #3 0x00007f3a1671bccd in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59 #4 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a029f7700) at pthread_create.c:308 #5 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 2 (Thread 0x7f3a03cfc700 (LWP 3067)): #0 0x00007f3a15e4c313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f3a16704e32 in Pipe::tcp_read_wait (this=<optimized out>) at msg/Pipe.cc:2107 #2 0x00007f3a16705150 in Pipe::tcp_read (this=0x7f39fc0020b0, buf=0x7f3a03cfbdbf "\377", len=1) at msg/Pipe.cc:2080 #3 0x00007f3a16718461 in Pipe::reader (this=0x7f39fc0020b0) at msg/Pipe.cc:1322 #4 0x00007f3a1671bced in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47 #5 0x00007f3a15b4ee9a in start_thread (arg=0x7f3a03cfc700) at pthread_create.c:308 #6 0x00007f3a15e57ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 1 (Thread 0x7f3a17706780 (LWP 29992)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 #1 0x00007f3a15b51065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f3a15b50eba in __pthread_mutex_lock (mutex=0x87bb38) at pthread_mutex_lock.c:61 #3 0x00007f3a16607fb3 in Mutex::Lock (this=0x87bb28, no_lockdep=<optimized out>) at common/Mutex.cc:89 #4 0x00007f3a17290cdc in librbd::ImageCtx::clear_nonexistence_cache (this=0x87b970) at librbd/ImageCtx.cc:588 #5 0x00007f3a172a8440 in librbd::refresh_parent (ictx=0x87b970) at librbd/internal.cc:1614 #6 0x00007f3a1729b9c9 in librbd::ictx_refresh (ictx=0x87b970) at librbd/internal.cc:1760 #7 0x00007f3a1729c9f8 in librbd::ictx_check (ictx=0x87b970) at librbd/internal.cc:1590 #8 0x00007f3a172a4a74 in librbd::info (ictx=0x87b970, info=..., infosize=160) at librbd/internal.cc:1175 #9 0x000000000040412e in check_size () at test/librbd/fsx.c:464 #10 0x000000000040597d in test () at test/librbd/fsx.c:1097 #11 0x00000000004031cf in main (argc=<optimized out>, argv=<optimized out>) at test/librbd/fsx.c:1560
run was
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-07-01_01:00:30-rbd-master-testing-basic/51544$ cat orig.config.yaml kernel: kdb: true sha1: 9e41c3a278d0c9a62eb96eae6c758504ce168e76 machine_type: plana nuke-on-error: true overrides: admin_socket: branch: master ceph: conf: global: ms inject socket failures: 5000 mon: debug mon: 20 debug ms: 20 debug paxos: 20 osd: osd op thread timeout: 60 fs: btrfs log-whitelist: - slow request sha1: e235958fb13e654aa1ad5e1c5a6ccf76a95196ca install: ceph: sha1: e235958fb13e654aa1ad5e1c5a6ccf76a95196ca s3tests: branch: master workunit: sha1: e235958fb13e654aa1ad5e1c5a6ccf76a95196ca roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - clock.check: null - install: null - ceph: conf: client: rbd cache: true - rbd_fsx: clients: - client.0 ops: 5000
Files
Updated by Sage Weil almost 11 years ago
- Status changed from New to 12
looks like bh_write is holding cache_lock (== ObjectCacher::lock) and trying ot take snap_read, while ictx_refresh() is holding snap_read (for write) and trying to take cache_lock.
Updated by Sage Weil almost 11 years ago
- Project changed from Ceph to rbd
- Category deleted (
librbd)
Updated by Sage Weil almost 11 years ago
- Assignee changed from Sage Weil to Josh Durgin
Updated by Josh Durgin almost 11 years ago
Another potential deadlock popped up: md_lock vs librados' lock ordering. Still trying to find which callback takes the md_lock.
Updated by Ian Colle over 10 years ago
- Status changed from In Progress to Need More Info
- Priority changed from High to Normal
Updated by Josh Durgin about 9 years ago
- Status changed from Need More Info to 12
- Assignee deleted (
Josh Durgin) - Priority changed from Normal to High
Encountered this during a very long fsx run, deadlock after a flatten due to ictx_refresh()->refresh_parent()->clear_nonexistence_cache() holding snap_lock and trying to take cache_lock, vs writeback from the cache holding cache_lock and waiting on snap_lock.
Thread 23 (Thread 0x7fc21f7fe710 (LWP 25808)): #0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:85 #1 0x00007fc22f8a92d7 in RWLock::get_read (this=0x2adf380) at ./common/RWLock.h:70 #2 0x00007fc22f925563 in librbd::LibrbdWriteback::write (this=0x2ad4b80, oid=..., oloc=..., off=3072926, len=7266, snapc=..., bl=..., mtime=..., trunc_size=0, trunc_seq=0, oncommit=0x7fc1b800dab0) at librbd/LibrbdWriteback.cc:160 #3 0x00007fc22fdcc493 in ObjectCacher::bh_write (this=0x2ad7e70, bh=0x2aed950) at osdc/ObjectCacher.cc:847 #4 0x00007fc22fdd3298 in ObjectCacher::flusher_entry (this=0x2ad7e70) at osdc/ObjectCacher.cc:1535 ---Type <return> to continue, or q <return> to quit--- #5 0x00007fc22fdd9bae in ObjectCacher::FlusherThread::entry() () from .libs/librbd.so.1 #6 0x00007fc22f994f1a in Thread::entry_wrapper (this=0x2ad80a0) at common/Thread.cc:61 #7 0x00007fc22f994e8c in Thread::_entry_func (arg=0x2ad80a0) at common/Thread.cc:45 #8 0x00007fc22c5d08ba in start_thread (arg=<value optimized out>) at pthread_create.c:300 #9 0x00007fc22b98802d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #10 0x0000000000000000 in ?? () ... Thread 1 (Thread 0x7fc232ee3760 (LWP 3225)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 #1 0x00007fc22c5d30e9 in _L_lock_953 () from /lib/libpthread.so.0 #2 0x00007fc22c5d2f0b in __pthread_mutex_lock (mutex=0x2adf338) at pthread_mutex_lock.c:61 #3 0x00007fc22fae759e in Mutex::Lock (this=0x2adf328, no_lockdep=false) at common/Mutex.cc:91 #4 0x00007fc22f8d4db7 in librbd::ImageCtx::clear_nonexistence_cache (this=0x2adf0d0) at librbd/ImageCtx.cc:681 #5 0x00007fc22f90aabc in librbd::refresh_parent (ictx=0x2adf0d0) at librbd/internal.cc:1851 #6 0x00007fc22f90c7e6 in librbd::ictx_refresh (ictx=0x2adf0d0) at librbd/internal.cc:2021 #7 0x00007fc22f90a782 in librbd::ictx_check (ictx=0x2adf0d0) at librbd/internal.cc:1827 #8 0x00007fc22f9067db in librbd::get_flags (ictx=0x2adf0d0, flags=0x7fff0c527170) at librbd/internal.cc:1482 #9 0x00007fc22f8a7348 in rbd_get_flags (image=0x2adf0d0, flags=0x7fff0c527170) at librbd/librbd.cc:1191 #10 0x000000000075a9ed in librbd_verify_object_map (ctx=0xbe0840) at test/librbd/fsx.c:379 #11 0x000000000075af33 in __librbd_flatten (ctx=0xbe0840) at test/librbd/fsx.c:559 #12 0x000000000075af4d in librbd_flatten (ctx=0xbe0840) at test/librbd/fsx.c:565 #13 0x000000000075d933 in do_flatten () at test/librbd/fsx.c:1622 #14 0x000000000075dfa8 in test () at test/librbd/fsx.c:1774 #15 0x000000000075edbc in main (argc=2, argv=0x7fff0c527d38) at test/librbd/fsx.c:2300
fsx command (which ran into the deadlock after op 57807) was:
ceph_test_librbd_fsx -d -W -R -p 100 -r 1 -w 1 -t 1 -h 1 -l 250000000 -S 773 -N 100000 rbd foo7
Updated by Jason Dillaman about 9 years ago
- Status changed from 12 to In Progress
Updated by Jason Dillaman about 9 years ago
- Status changed from In Progress to Fix Under Review
- Backport set to hammer,firefly
Updated by Jason Dillaman about 9 years ago
- Backport changed from hammer,firefly to hammer,firefly,dumpling
Updated by Josh Durgin about 9 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport changed from hammer,firefly,dumpling to hammer,firefly
Updated by Loïc Dachary about 9 years ago
- invalid and closed firefly backport https://github.com/ceph/ceph/pull/4176
Updated by Jason Dillaman about 9 years ago
Firefly backport: https://github.com/ceph/ceph/pull/4206
Updated by Jean-Charles Lopez about 9 years ago
- File PR4013-CherryPick.txt PR4013-CherryPick.txt added
Updated by Loïc Dachary about 9 years ago
- Severity changed from 3 - minor to 2 - major
Updated by Loïc Dachary about 9 years ago
- hammer backport https://github.com/ceph/ceph/pull/4496
Updated by Loïc Dachary about 9 years ago
- Status changed from Pending Backport to Resolved
- Regression set to No
Actions