New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: shut down if we flap too many times in a short period #6708
Conversation
reopen #6615 as automatically shutdown by branch deleted |
A few cosmetic issue, otherwise this looks good! Thanks- |
d223c45
to
9a14ff7
Compare
@liewegas ,done with the cosmetic issue:) |
OSD stauts may flapping due to some hardware/network issue. Although we tried our best to self healthing but still in some case the OSD is still flipping and require admin to operate. This patch try another approach that shutdown the OSD after being marked down certain times(flapping), thus speed up the convergence of cluster. Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
Sorry, one last request: can you construct a simple test for this? One that runs during 'make check' is probably simplest (see test//.sh). I would start an osd, mark it down explicitly 6 times (ceph osd down N), and verify that it comes back up 5 times and then doesn't on the 6th time. Thanks! |
@liewegas ,sorry for the late reply, test added:) |
TEST_markdown_exceed_maxdown_count : down N+1 times within period, should be UP in first N times but DOWN in last time. TEST_markdown_boot : down exactly N times within period, should be UP for all N times. TEST_markdown_boot_exceed_time : down N+1 times but exceed the period, should be UP after the test. Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
osd: shut down if we flap too many times in a short period Reviewed-by: Sage Weil <sage@redhat.com>
OSD stauts may flipping due to some hardware/network issue.
Although we tried our best to self healthing but still in some case
the OSD is still flipping and require admin to operate.
This patch try another approach that shutdown the OSD after certain
times of reboot(flipping), thus speed up the convergence of cluster.
Signed-off-by: Xiaoxi Chen xiaoxi.chen@intel.com