Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: shut down if we flap too many times in a short period #6708

Merged
merged 2 commits into from Dec 19, 2015

Conversation

xiaoxichen
Copy link
Contributor

OSD stauts may flipping due to some hardware/network issue.
Although we tried our best to self healthing but still in some case
the OSD is still flipping and require admin to operate.

This patch try another approach that shutdown the OSD after certain
times of reboot(flipping), thus speed up the convergence of cluster.

Signed-off-by: Xiaoxi Chen xiaoxi.chen@intel.com

@xiaoxichen
Copy link
Contributor Author

reopen #6615 as automatically shutdown by branch deleted

@liewegas
Copy link
Member

A few cosmetic issue, otherwise this looks good! Thanks-

@xiaoxichen
Copy link
Contributor Author

@liewegas ,done with the cosmetic issue:)

OSD stauts may flapping due to some hardware/network issue.
Although we tried our best to self healthing but still in some case
the OSD is still flipping and require admin to operate.

This patch try another approach that shutdown the OSD after being
marked down certain times(flapping), thus speed up the convergence of cluster.

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
@liewegas
Copy link
Member

Sorry, one last request: can you construct a simple test for this? One that runs during 'make check' is probably simplest (see test//.sh). I would start an osd, mark it down explicitly 6 times (ceph osd down N), and verify that it comes back up 5 times and then doesn't on the 6th time. Thanks!

@xiaoxichen
Copy link
Contributor Author

@liewegas ,sorry for the late reply, test added:)

TEST_markdown_exceed_maxdown_count :
  down N+1 times within period, should be UP in first N times but DOWN in last time.
TEST_markdown_boot :
  down exactly N times within period, should be UP for all N times.
TEST_markdown_boot_exceed_time :
  down N+1 times but exceed the period, should be UP after the test.

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
@xiaoxichen xiaoxichen assigned liewegas and unassigned xiaoxichen Dec 15, 2015
liewegas added a commit that referenced this pull request Dec 19, 2015
osd: shut down if we flap too many times in a short period

Reviewed-by: Sage Weil <sage@redhat.com>
@liewegas liewegas merged commit a96ee93 into ceph:master Dec 19, 2015
@ghost ghost changed the title osd/OSD.cc: shutdown after flipping certain times osd: shut down if we flap too many times in a short period Feb 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants