Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: fix temp object removal after upgrade #6976

Merged
merged 3 commits into from Jan 21, 2016
Merged

Conversation

dzafman
Copy link
Contributor

@dzafman dzafman commented Dec 18, 2015

Various scrub fixes:
Fix 13862 by removing old Hammer temp objects.
Clean up a log message.
Skip temporary objects on both primary and replicas.

@dzafman dzafman added this to the jewel milestone Dec 18, 2015
@dzafman
Copy link
Contributor Author

dzafman commented Dec 18, 2015

This was manually test. I'm not sure how to add a Hammer upgrade test which would leave stray temp objects in order to verify that they are cleaned up.

@liewegas
Copy link
Member

I think this is being caught by the existing tests, see http://tracker.ceph.com/issues/13381

build_scrub_map_chunk(
map, start, end, msg->deep, msg->seed,
map, msg->start, msg->end, msg->deep, msg->seed,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we can remove this. The problem was with a primary OSD running hammer and a replica running new code. In that case, we need to specify bounds that fall within the PG (hammer would fill in some fields but not pool, which would either assert or skip items, i forget which).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my change the start already has the pool number in it being passed from caller. The only affect my code has is that as you noted above I didn't set "scrubber.end." I couldn't see a reason for it, so I left it out. As a matter of fact I always saw end shown as MAX in the log even when I set the pool there too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't thinking that this code runs on the replicas, so now I see what you mean about interacting with older releases. I'll put this back.

@dzafman
Copy link
Contributor Author

dzafman commented Dec 23, 2015

I'll just check "q->hobj.get_pool() == -1" as the entire check and not use is_meta() because we don't want to break if is_meta() changes in the future. We are checking for an old release remnant at that point.

I was being paranoid not to confuse the is_pgmeta() object (possibly an older one) from the pool == -1 temporary objects from prior releases. So the name.empty() check is probably not be needed because if pool == -1 it will never also be an is_pgmeta() object.

  bool is_pgmeta() const {
    // make sure we are distinct from hobject_t(), which has pool INT64_MIN
    return hobj.pool >= 0 && hobj.oid.name.empty();
  }

hobject_t::is_meta() looks like this:

 bool is_meta() const {
    return pool == POOL_META;
  }

Fixes: ceph#13862

Signed-off-by: David Zafman <dzafman@redhat.com>
Remove redundant dout()

Signed-off-by: David Zafman <dzafman@redhat.com>
@liewegas
Copy link
Member

liewegas commented Jan 4, 2016

Is that last patch needed, now that the temp objects get cleaned up on startup?

@dzafman
Copy link
Contributor Author

dzafman commented Jan 4, 2016

@liewegas I think the reason for the last commit has changed. If we set the pool, then during scrubbing the FileStore::collection_list() as an example, doesn't waste cycles reading the temp collection only to have the list pruned by PGBackend::objects_list_range(() called by PG::build_scrub_map_chunk().

FileStore::collection_list():

  if (!c.is_temp() && !c.is_meta()) {
    if (cmp_bitwise(start, sep) < 0) { // bitwise vs nibble doesn't matter here
      dout(10) << __func__ << " first checking temp pool" << dendl;
      coll_t temp = c.get_temp();
      int r = collection_list(temp, start, end, sort_bitwise, max, ls, next);

@dzafman dzafman force-pushed the wip-13862 branch 3 times, most recently from 573ad2e to 1727b40 Compare January 4, 2016 23:47
@dzafman
Copy link
Contributor Author

dzafman commented Jan 4, 2016

@liewegas I changed the commit message on the last commit and removed the revert of PG::replica_scrub() change which set pool for scrubbing with hammer OSDs.

Signed-off-by: David Zafman <dzafman@redhat.com>
@liewegas
Copy link
Member

liewegas commented Jan 5, 2016

Ok, I think this'll work! And it fixes your original issue (the second commit does i guess), right?

@dzafman
Copy link
Contributor Author

dzafman commented Jan 5, 2016

@liewegas Yes, the second commit causes the old temp objects to be removed.

liewegas added a commit that referenced this pull request Jan 21, 2016
@liewegas liewegas merged commit 225b5fe into ceph:jewel Jan 21, 2016
@ghost ghost changed the title Wip 13862 osd: fix temp object removal after upgrade Feb 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants