Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: allow peek_map_epoch to return an error #5892

Merged
merged 2 commits into from Sep 12, 2015
Merged

Conversation

liewegas
Copy link
Member

@@ -2864,13 +2866,19 @@ epoch_t PG::peek_map_epoch(ObjectStore *store,
values.clear();
keys.insert(ek);
store->omap_get_values(META_COLL, legacy_infos_oid, keys, &values);
assert(values.size() == 1);
if (values.size() < 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to ensure this is not an accident data file loss?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to ensure this is not an accident data file loss?

I'm not sure how to tell.

But, let's say the infos object is missing but shouldn't be.. in that
case, all PGs will be ignored and the osd will come up (vs previously we
would crash). There is a corner case where this is bad: the osd in
question had the only copy of the latest write, peering probed it, and got
a does not exist response and silently reverts to an older copy.
Something similar happens from the pg import/export tests currently (see
#12687). I don't think there is a way to tell, though...

@liewegas
Copy link
Member Author

ceph/ceph-qa-suite#563 verifies the hammer part of the fix. waiting on gitbuilder to verify the infernalis fix too.

@liewegas
Copy link
Member Author

@dzafman Do you mind reviewing this one?

bufferlist *bl)
int PG::peek_map_epoch(ObjectStore *store,
spg_t pgid,
epoch_t *pepoch,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm use to the p at then end of the name instead of the front.

@dzafman
Copy link
Contributor

dzafman commented Sep 11, 2015

Other than some minor nits we can fix later, 👍

Allow PG::peek_map_epoch to return an error indicating the PG
should be skipped.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit f15d958)

[fixed *pepoch default of 0]
[fixed other return paths in peek_map_epoch]
- pg is removed
- osd is stopped before pg is fully removed
- on restart, we ignore/skip the pg because its epoch is too old
- we upgrade to hammer and convert other pgs, skipping this one, and then
  remove the legacy infos object
- hammer starts, tries to parse the legacy pg, and fails because the infos
  object is gone, crashing.

The fix is to continue ignoring the zombie pg.

Fixes: #16030
Signed-off-by: Sage Weil <sage@redhat.com>
liewegas added a commit that referenced this pull request Sep 12, 2015
osd: allow peek_map_epoch to return an error

Reviewed-by: David Zafman <dzafman@redhat.com>
@liewegas liewegas merged commit f35c53d into hammer Sep 12, 2015
@liewegas liewegas deleted the wip-13060-hammer branch September 12, 2015 13:23
@ghost ghost changed the title DNM: osd: allow peek_map_epoch to return an error osd: allow peek_map_epoch to return an error Sep 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants