Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

stage0: brute-force GC on GC failure #1828

Merged
merged 3 commits into from Dec 8, 2015
Merged

Conversation

blixtra
Copy link
Collaborator

@blixtra blixtra commented Dec 2, 2015

When a pod directory is in an unexpected state, it may happen that the
standard garbage collection fails. When this happens mounts can be left
around, requiring manual removal.

This change forces a GC of the pod mount points when the pod GC
otherwise fails. To do this, the /proc/self/mountinfo file is parsed,
the pod mount points are extracted, ordered, then unmounted.

fixes #1572

if err != nil {
stderr("Error retrieving app hashes from pod manifest: %v", err)
return
if err := cleanExitedGarbage(p); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this seems right. This goes straight to the force unmounting irrespective of the failure condition, e.g. if https://github.com/coreos/rkt/pull/1828/files#diff-204288dd8b5d5205881366c76eb4f97eR184 fails initialising the store. Might that not cause pods to get in a weird half-GCed state? Should we be more nuanced at distinguishing actual failure conditions here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the example you've given and have moved that out of the checkExitGarbage function. However, I don't think there is a need to get more fancy with the rest of the failures. If the next one we get to fails (treestoreid), or any thereafter, this means the pod directory is in a weird state. IMHO we should just bail and force clean at that point.


for _, mnt := range mnts {
if err := syscall.Unmount(mnt.mountPoint, 0); err != nil {
return fmt.Errorf("couldn't unmounting at %v: %v", mnt.mountPoint, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't unmount

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop the contraction in favor of could not, but won't insist on it.

@iaguis iaguis added this to the v0.14.0 milestone Dec 4, 2015
@@ -8,3 +8,4 @@
/makelib/variables.mk
/build-rkt*
.vagrant
tags
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this file is only in the toplevel directory of the repository, then prepend it with /. If not, then ignore the comment.

@@ -179,6 +179,49 @@ func emptyGarbage() error {
return nil
}

// cleanExitedGarbage prepares the pod's directory for removal. If anything
// fails here then we can conclude that the rkt directory is in a bad state and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the pod's directory, not rkt, right?


// execute stage1's GC
if err := stage0.GC(p.path(), p.uuid, stage1RootFS, globalFlags.Debug); err != nil {
return fmt.Errorf("problem performing stage1 GC on %v, attempting brute-force GC: %v", p.uuid, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't use this error string. Maybe print that instead of https://github.com/coreos/rkt/pull/1828/files#diff-204288dd8b5d5205881366c76eb4f97eR243?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Well, if we are catching this I'd rather use something like debug output to say that we caught this and are forcing the GC.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this

                stderr("Recovering from the following GC error: %v", err)
                stderr("Performing forced GC on pod %q", p.uuid)

I use gotags and don't wish to be reminded with each git status.
When a pod directory is in an unexpected state, it may happen that the
standard garbage collection fails. When this happens mounts can be left
around, requiring manual removal.

This change forces a GC of the pod mount points when the pod GC
otherwise fails. To do this, the /proc/self/mountinfo file is parsed,
the pod mount points are extracted, ordered, then unmounted.

fixes rkt#1572
@iaguis
Copy link
Member

iaguis commented Dec 8, 2015

LGTM

iaguis added a commit that referenced this pull request Dec 8, 2015
stage0: brute-force GC on GC failure
@iaguis iaguis merged commit 2955cc5 into rkt:master Dec 8, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rkt: add --force option to gc?
4 participants