New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mon: mon crashes when "ceph osd tree 85 --format json" #4936
Conversation
I think the make-check bot failure above is spurious. Can you please re-push so it will trigger a new build attempt? |
* as a side effect, this change silences http://tracker.ceph.com/issues/11576 Fixes: #11576 Signed-off-by: Kefu Chai <kchai@redhat.com> (cherry picked from commit e7b196a)
* check for dangling bucket name or type names referenced by the buckets/items in the crush map. * also check for the references from Item(0, 0, 0) which does not necessarily exist in the crush map under testing. the rationale behind this is: the "ceph osd tree" will also print stray OSDs whose id is greater or equal to 0. so it would be useful to check if the crush map offers the type name indexed by "0" (the name of OSDs is always "OSD.{id}", so we don't need to look up the name of an OSD item in the crushmap). Signed-off-by: Kefu Chai <kchai@redhat.com> (cherry picked from commit b75384d)
* so one is able to verify that the "ceph osd tree" won't chock on the new crush map because of dangling name/type references Signed-off-by: Kefu Chai <kchai@redhat.com> (cherry picked from commit d6b46d4)
* the "osd tree dump" command enumerates all buckets/osds found in either the crush map or the osd map. but the newly set crushmap is not validated for the dangling references, so we need to check to see if any item in new crush map is referencing unknown type/name when a new crush map is sent to monitor, reject it if any. Fixes: #11680 Signed-off-by: Kefu Chai <kchai@redhat.com> (cherry picked from commit a955f36)
Signed-off-by: Kefu Chai <kchai@redhat.com> (cherry picked from commit e640d89)
d5d8b5b
to
5141301
Compare
@ktdreyer @theanalyst removed the commit included in #5122 from this pr and repushed. |
since this pr has been tested per http://tracker.ceph.com/issues/11990, before the commit from #5122 was removed. it's good to merge along with #5122 . |
mon crashes when "ceph osd tree 85 --format json" Reviewed-by: Kefu Chai <kchai@redhat.com>
It looks like the bot failure is an actual problem. See also http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=552772025cb8d5f51ffb3a069d1bd93bc73f1123. I think to remember a pull request fixed racing code in ceph-helpers or something. I'll dig into this. |
* Back in Hammer, the osd-crush.sh individual tests did not run the monitor, it was taken care of by the run() function. An attempt to run another mon fails with: error: IO lock testdir/osd-crush/a/store.db/LOCK: Resource temporarily unavailable This problem was introduced by cc1cc03 from ceph#4936 * replace test/mon/mon-test-helpers.sh with test/ceph-helpers.sh as we need run_osd() in this newly added test http://tracker.ceph.com/issues/11975 Refs: ceph#11975 Signed-off-by: Loic Dachary <ldachary@redhat.com>
* Back in Hammer, the osd-crush.sh individual tests did not run the monitor, it was taken care of by the run() function. An attempt to run another mon fails with: error: IO lock testdir/osd-crush/a/store.db/LOCK: Resource temporarily unavailable This problem was introduced by cc1cc03 from ceph#4936 * replace test/mon/mon-test-helpers.sh with test/ceph-helpers.sh as we need run_osd() in this newly added test * update the run-dir of commands: ceph-helpers.sh use the different convention for the run-dir of daemons. http://tracker.ceph.com/issues/11975 Refs: ceph#11975 Signed-off-by: Loic Dachary <ldachary@redhat.com>
http://tracker.ceph.com/issues/11975