Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

initial implementation of rkt fly as stage1 #1833

Merged
merged 6 commits into from Dec 18, 2015
Merged

initial implementation of rkt fly as stage1 #1833

merged 6 commits into from Dec 18, 2015

Conversation

steveej
Copy link
Contributor

@steveej steveej commented Dec 3, 2015

Alternative proposal for #1825

  • refactor package names
  • implementation
  • add CHANGELOG entry
  • add documentation

@jonboulle
Copy link
Contributor

Can you enumerate some pros/cons of the two different approaches?

Stefan Junker notifications@github.com schrieb am Do., 3. Dez. 2015 15:58:

Alternative proposal for #1825 #1825

You can view, comment on, or merge this pull request online at:

#1833
Commit Summary

  • fly: add new stage1
  • initial structure
  • stage1/fly: create status directory for compatibility with stage0
  • stage1_fly: add gc and enter stubs
  • rkt/gc: use MNT_DETACH to unmount stage1 incl. apps
  • stage1: refactor common stage1 code

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#1833.

@steveej
Copy link
Contributor Author

steveej commented Dec 4, 2015

This approach blends in with the existing rkt and stage0 structure, makes use of the overlay, store and gc functionalities and thereby reuses as much code as possible.

The other version operates pre-stage0 and has no functionality for overlay or gc.

@jonboulle
Copy link
Contributor

ok, any disadvantages of this one?

On Thu, Dec 3, 2015 at 6:31 PM, Stefan Junker notifications@github.com
wrote:

This approach blends in with the existing rkt and stage0 structure, makes
use of the overlay, store and gc functionalities and thereby reuses as much
code as possible.

The other version operates pre-stage0 and has no functionality for overlay
or gc.


Reply to this email directly or view it on GitHub
#1833 (comment).

@steveej
Copy link
Contributor Author

steveej commented Dec 4, 2015

In order to use it, one will have to use a different stage1 which has an impact on the build process and the command line. Including the new stage1 in the build by default shouldn't be a problem, and once we have a more convenient local stage1 lookup that will be easier too.

@jonboulle
Copy link
Contributor

I'm concerned about this being extremely different behaviour depending on the stage1 that happens to be used. rkt run is supposed to be for running pods; the point of splitting rkt fly into a separate subcommand is to be explicit about that.

@vcaputo
Copy link
Contributor

vcaputo commented Dec 4, 2015

$.02

rkt fly is a UX detail, stage1-fly is an implementation detail. Just wire up rkt fly to stage1-fly, I think it makes sense for this to be implemented as just another stage1.

@aaronlevy
Copy link
Contributor

I guess I'm still somewhat unclear on the intended functionality of rkt fly. Originally, I thought it was more or less chroot into the aci rootfs & exec binary. But the current approach seems to be creating a new mount namespace, which seems to add a fair bit of complexity to managing mount propagation.

I had previously asked @steveej about this, and one of the reasons was "new mount namespace masks all the existing mounts that are present on the host. You don't want to see all other container's mounts in a container".

But is that actually the case? If I were running something directly on host (e.g. chroot + host namespace), I would expect other processes to be able to see those mounts. But this comes back to: what exactly is the intended functionality of rkt-fly?

@steveej
Copy link
Contributor Author

steveej commented Dec 8, 2015

@jonboulle Implementing this in pre-stage1 would not fit in with the original thought behind the separation. rkt fly is not supposed to duplicate any of the features for handling of ACI, and it shouldn't do them any differently either. So it makes sense to build upon the stage that already handles this which is stage0. What should be different is how the applications that are shipped with the ACI are executed, and that lies in the responsibility of the stage1.

@vcaputo Thanks for the suggestion of making rky fly a synonym for rkt run --stage1=/path/to/stage1-fly.aci @vcaputo. In fact I had the same thought a couple of days ago. On the same page we could discuss which of the stage1's that are maintained by us, if so, should receive their own rkt sub-command. I mostly refer to rkt lkvm here.

@aaronlevy I've changed my mind about the separate mount namespace, and the current implementation does create a new one. Any bind-mounts inside the container are recursively shared, which allows the container to propagate mounts back to the host. These changes also remain after the container has exited and also survive GC.

@alban
Copy link
Member

alban commented Dec 8, 2015

@steveej If the new mount namespace stays recursively shared to the host mount namespace, why creating a new one?

@@ -14,11 +14,10 @@

//+build linux

package main
package stage1initcommon
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this kind of names, should be just "common" (or better, a meaningful name): https://golang.org/doc/effective_go.html#names

@iaguis
Copy link
Member

iaguis commented Dec 8, 2015

Why stage1_fly and stage1_common and no stage1/fly stage1/common?

log.Fatalf("readLines: %s", err)
}

regex := regexp.MustCompile(fmt.Sprintf(".*%s.*", podID))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks promising on first sight.

@iaguis
Copy link
Member

iaguis commented Dec 8, 2015

There's no new mount ns in this version, oder? You never Unshare.

@steveej
Copy link
Contributor Author

steveej commented Dec 8, 2015

@vcaputo @iaguis

I don't know if anyone else has already pointed this out, but LockOSThread() doesn't prevent the creation of new, unlocked OS threads. So if you mess with the thread's namespaces, the unique state can be leaked into newly created OS threads, if you do anything causing the Go scheduler to decide a new thread is needed (blocking syscalls for example, can't simply be permitted to tie up the runtime, so more threads get created under the hood).

Thanks for this. In fact the calls to LockOSThread() can be removed entirely as there was no

I'm a bit surprised to see all this networking and namespace stuff here in stage1-fly, as I thought it was just a shared-host execution environment for plainly running what stage0 prepared for it.

The code doesn't contain any namespace functionality at this point. If you're referring to the flags,
except for the mount/volume flags none of them is evaluated.

why is this being renamed to stage1initcommon?

I had to split the kvm related functions to untangle an import loop. Then I had to solve naming conflicts. The refactoring can definitely be done in better with a wider scope and more meaningful names, but this is a start.

There's no new mount ns in this version, oder? You never Unshare.

Right.

@steveej
Copy link
Contributor Author

steveej commented Dec 8, 2015

Sorry @iaguis I missed to answer this question:

Why stage1_fly and stage1_common and no stage1/fly stage1/common?

I wanted to separate this completely from the existing structure. As we refactor we can merge them together. My impression was that it is also easier for the build system this way, @krnowak probably has an opinion on that.

@krnowak
Copy link
Collaborator

krnowak commented Dec 9, 2015

@iaguis, @steveej: About stage1_fly being outside stage1 directory - it was just a quickest way to set it up. Also, stage1_fly is a kind of special snowflake as it does have its own binaries for enter, run and gc.

But yeah, we could probably have a tree like:

  • stage1
    • common?
    • enter
    • init
    • gc
    • usr_from_{coreos,kvm,host,src}
    • fly
      • enter
      • run
      • gc

@jonboulle jonboulle mentioned this pull request Dec 9, 2015
3 tasks
@krnowak
Copy link
Collaborator

krnowak commented Dec 9, 2015

@steveej: Could you rebase your work on top of https://github.com/kinvolk/rkt/tree/krnowak/bs-fly-squashed (it has some fixes in build system you didn't incorporate into your PR).

@aaronlevy
Copy link
Contributor

Testing against running kubelet with fly stage1: upstream k8s conformance tests passed for single node, but a couple tests are failing on a multi node cluster. The 1.1 conformance tests are known to be flakey at this point, so I wouldn't hold based on these. I'll try and dig deeper into the failed tests tomorrow, but so far this is looking good from my end.

For reference the failed tests:

[Fail] SchedulerPredicates [It] validates resource limits of pods that are allowed to run [Conformance]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/scheduler_predicates.go:277

[Fail] Networking [BeforeEach] should provide Internet connection for containers [Conformance]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/networking.go:47

@steveej
Copy link
Contributor Author

steveej commented Dec 11, 2015

@aaronlevy Those two failures are probably legit:

  1. This mode in rkt doesn't apply any resource limits, and it is not a planned feature for now.
  2. You might be missing a resolv.conf in the container. Easiest way is to just mount the one from your host.

flag.BoolVar(&debug, "debug", false, "Run in debug mode")
flag.Var(&netList, "net", "Setup networking")
flag.BoolVar(&interactive, "interactive", false, "The pod is interactive")
flag.StringVar(&privateUsers, "private-users", "", "Run within user namespace. Can be set to [=UIDBASE[:NUIDS]]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rkt fly will not use user namespaces. I don't know if we need to keep this parameter: it depends on #1871

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep this parameter, we should at least return an error when it is used...

if _, isHostMount := hostMounts[tuple.V.Source]; isHostMount {
// Mark the host mount as SHARED so the container's changes to the mount are propagated to the host
argFlyMounts = append(argFlyMounts,
flyMount{"", "", tuple.V.Source, "none", syscall.MS_REC | syscall.MS_SHARED},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is still recursive & shared?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it's not for the actual bind-mount. It's for making the host-side of the bind-mount RSHARED if it's a mount itself.

└─ user-app1
```

### Mount propagation modes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline after this one

@alban
Copy link
Member

alban commented Dec 18, 2015

This looks good to me and Semaphore is green!

@steveej: please merge this PR and prepare a separate PR for fixing the comments reported by @jonboulle

🎉

// See the License for the specific language governing permissions and
// limitations under the License.

package stage1_common
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/stage1_common/common/
everywhere it's imported you alias it anyway

steveej added a commit that referenced this pull request Dec 18, 2015
initial implementation of rkt fly as stage1
@steveej steveej merged commit d11afa0 into rkt:master Dec 18, 2015
"strings"
"syscall"

stage1common "github.com/coreos/rkt/stage1/common"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import ordering is a bit weird here, godeps should be before the others ideally

discardString string
)

func getHostMounts() (map[string]struct{}, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func getHostMounts(mi io.Reader) for testing, fun, profit..

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants