New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
librbd: Possible crash while concurrently writing and shrinking an image #5318
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ghost
changed the title
Possible crash while concurrently writing and shrinking an image
DNM: Possible crash while concurrently writing and shrinking an image
Jul 28, 2015
ghost
self-assigned this
Jul 28, 2015
Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit e5ffae5)
Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 21f990e)
Conditionally support helgrind annotations if valgrind support is enabled during the build. Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 650ad32)
librbd use of an image hierarchy resulted in lock names being re-used and incorrectly analyzed. librbd now uses unique lock names per instance, but to prevent an unbounded growth of tracked locks, we now remove lock tracking once a lock is destructed. Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 7c7df2c)
Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 6e400b9)
Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit b65ae4b)
Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit c1e1445)
This is needed to allow an atomic compare and update operation from the rebuild object map utility. Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 2db758c)
The ObjectCacher complete the read callback while still holding the cache lock. This introduces lock ordering issues which are resolved by queuing the completion to execute in a clean (unlocked) context. Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 0024677)
Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit c474ee4)
The callback routine most likely will attempt to retrieve the result code, which will result in a recursive lock attempt. Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 3ad19ae)
It is only used by clients and it causes a large slowdown in performance due to the rate at which the lock is constructed/ destructed for each IO request. Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 66e7464)
In order to support the invariant that all state machine callbacks occur without holding locks, transitions that don't always involve a librados call should queue their callback. Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 218bc2d)
Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 5f157f2)
Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit c352bcd)
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Fix for: CID 1274319: Uninitialized scalar field (UNINIT_CTOR) uninit_member: Non-static class member m_object_state is not initialized in this constructor nor in any functions that it calls. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de> (cherry picked from commit 48f18ea)
Moved all parent overlap computation to within AioRequest so that callers don't need to independently compute the overlap. Also removed the need to pass the snap_id for write operations since it can only be CEPH_NOSNAP. Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 4651597)
Move AbstractWrite's invocation of copyup to the CopyupRequest class. The AioRequest write path will now always create a CopyupRequest, which will now append the actual write ops to the copyup. Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 7be3df6)
Detected during an fsx run where a refresh and CoR were occurring concurrently. The refresh held the snap_lock and was waiting on the object_map_lock, while the CoR held object_map_lock and was waiting for snap_lock. Fixes: #11577 Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 8cbd92b)
It is expected that all IO is flushed and all async ops are cancelled prior to releasing the exclusive lock. Therefore, replace handling of lost exclusive locks in state machines with an assertion. Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit d6b733d)
Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 1b57cc1)
librbd requires the ObjectCacher flusher thread to acquire an additional lock in order to maintain lock ordering constraints. Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit a38f9e5)
Flush might result in the cache writing out dirty objects, which would require that the owner_lock be held. Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit c9142fe)
Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 45cb9cb)
Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 742a85d)
Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 3d5cef3)
Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 43e0e3c)
Cache writeback operations will expect the owner lock to be held. Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit adfa2e0)
It is expensive to collect backtraces every time a lock is checked in order to provide cycle backtraces. The backtraces can be forced on for specific locks or globally via the new config option "lockdep_force_backtrace". Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 7354d25)
ghost
changed the title
DNM: Possible crash while concurrently writing and shrinking an image
Possible crash while concurrently writing and shrinking an image
Jul 29, 2015
ghost
changed the title
Possible crash while concurrently writing and shrinking an image
DNM: Possible crash while concurrently writing and shrinking an image
Jul 29, 2015
dillaman
force-pushed
the
wip-12236-hammer
branch
from
July 29, 2015 14:41
45c513a
to
e1e8b20
Compare
dillaman
changed the title
DNM: Possible crash while concurrently writing and shrinking an image
Possible crash while concurrently writing and shrinking an image
Jul 29, 2015
When shrinking an image, it's possible that the op flush callback will be from within the cache callback context. This would result in a deadlock when attempting to re-lock the cache lock in order to invalidate the cache. Fixes: #11743 Backport: hammer Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 726d699)
Signed-off-by: Jason Dillaman <dillaman@redhat.com> (cherry picked from commit 8dfcc72)
dillaman
force-pushed
the
wip-12236-hammer
branch
from
July 29, 2015 17:42
e1e8b20
to
b575ecc
Compare
ghost
pushed a commit
that referenced
this pull request
Jul 29, 2015
…nd shrinking an image Reviewed-by:
ghost
mentioned this pull request
Aug 30, 2015
Passes the rbd suite http://tracker.ceph.com/issues/12701#rbd |
ghost
pushed a commit
that referenced
this pull request
Aug 30, 2015
Possible crash while concurrently writing and shrinking an image Reviewed-by: Loic Dachary <ldachary@redhat.com>
ghost
changed the title
Possible crash while concurrently writing and shrinking an image
librbd: Possible crash while concurrently writing and shrinking an image
Sep 13, 2015
This pull request was closed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
http://tracker.ceph.com/issues/12236
Note: contains commits included in PR #5296 that can be rebased away once they merge.