initial implementation of rkt fly as stage1 #1833
Conversation
Can you enumerate some pros/cons of the two different approaches? Stefan Junker notifications@github.com schrieb am Do., 3. Dez. 2015 15:58:
|
This approach blends in with the existing rkt and stage0 structure, makes use of the overlay, store and gc functionalities and thereby reuses as much code as possible. The other version operates pre-stage0 and has no functionality for overlay or gc. |
ok, any disadvantages of this one? On Thu, Dec 3, 2015 at 6:31 PM, Stefan Junker notifications@github.com
|
In order to use it, one will have to use a different stage1 which has an impact on the build process and the command line. Including the new stage1 in the build by default shouldn't be a problem, and once we have a more convenient local stage1 lookup that will be easier too. |
I'm concerned about this being extremely different behaviour depending on the stage1 that happens to be used. |
$.02
|
I guess I'm still somewhat unclear on the intended functionality of rkt fly. Originally, I thought it was more or less chroot into the aci rootfs & exec binary. But the current approach seems to be creating a new mount namespace, which seems to add a fair bit of complexity to managing mount propagation. I had previously asked @steveej about this, and one of the reasons was "new mount namespace masks all the existing mounts that are present on the host. You don't want to see all other container's mounts in a container". But is that actually the case? If I were running something directly on host (e.g. chroot + host namespace), I would expect other processes to be able to see those mounts. But this comes back to: what exactly is the intended functionality of rkt-fly? |
@jonboulle Implementing this in pre-stage1 would not fit in with the original thought behind the separation. @vcaputo Thanks for the suggestion of making @aaronlevy I've changed my mind about the separate mount namespace, and the current implementation does create a new one. Any bind-mounts inside the container are recursively shared, which allows the container to propagate mounts back to the host. These changes also remain after the container has exited and also survive GC. |
@steveej If the new mount namespace stays recursively shared to the host mount namespace, why creating a new one? |
@@ -14,11 +14,10 @@ | |||
|
|||
//+build linux | |||
|
|||
package main | |||
package stage1initcommon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this kind of names, should be just "common" (or better, a meaningful name): https://golang.org/doc/effective_go.html#names
Why |
log.Fatalf("readLines: %s", err) | ||
} | ||
|
||
regex := regexp.MustCompile(fmt.Sprintf(".*%s.*", podID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks promising on first sight.
There's no new mount ns in this version, oder? You never Unshare. |
Thanks for this. In fact the calls to
The code doesn't contain any namespace functionality at this point. If you're referring to the flags,
I had to split the kvm related functions to untangle an import loop. Then I had to solve naming conflicts. The refactoring can definitely be done in better with a wider scope and more meaningful names, but this is a start.
Right. |
Sorry @iaguis I missed to answer this question:
I wanted to separate this completely from the existing structure. As we refactor we can merge them together. My impression was that it is also easier for the build system this way, @krnowak probably has an opinion on that. |
@iaguis, @steveej: About stage1_fly being outside stage1 directory - it was just a quickest way to set it up. Also, stage1_fly is a kind of special snowflake as it does have its own binaries for enter, run and gc. But yeah, we could probably have a tree like:
|
@steveej: Could you rebase your work on top of https://github.com/kinvolk/rkt/tree/krnowak/bs-fly-squashed (it has some fixes in build system you didn't incorporate into your PR). |
Testing against running kubelet with fly stage1: upstream k8s conformance tests passed for single node, but a couple tests are failing on a multi node cluster. The 1.1 conformance tests are known to be flakey at this point, so I wouldn't hold based on these. I'll try and dig deeper into the failed tests tomorrow, but so far this is looking good from my end. For reference the failed tests:
|
@aaronlevy Those two failures are probably legit:
|
flag.BoolVar(&debug, "debug", false, "Run in debug mode") | ||
flag.Var(&netList, "net", "Setup networking") | ||
flag.BoolVar(&interactive, "interactive", false, "The pod is interactive") | ||
flag.StringVar(&privateUsers, "private-users", "", "Run within user namespace. Can be set to [=UIDBASE[:NUIDS]]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rkt fly
will not use user namespaces. I don't know if we need to keep this parameter: it depends on #1871
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this parameter, we should at least return an error when it is used...
if _, isHostMount := hostMounts[tuple.V.Source]; isHostMount { | ||
// Mark the host mount as SHARED so the container's changes to the mount are propagated to the host | ||
argFlyMounts = append(argFlyMounts, | ||
flyMount{"", "", tuple.V.Source, "none", syscall.MS_REC | syscall.MS_SHARED}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is still recursive & shared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it's not for the actual bind-mount. It's for making the host-side of the bind-mount RSHARED if it's a mount itself.
└─ user-app1 | ||
``` | ||
|
||
### Mount propagation modes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline after this one
This looks good to me and Semaphore is green! @steveej: please merge this PR and prepare a separate PR for fixing the comments reported by @jonboulle 🎉 |
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package stage1_common |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/stage1_common/common/
everywhere it's imported you alias it anyway
initial implementation of rkt fly as stage1
"strings" | ||
"syscall" | ||
|
||
stage1common "github.com/coreos/rkt/stage1/common" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import ordering is a bit weird here, godeps should be before the others ideally
discardString string | ||
) | ||
|
||
func getHostMounts() (map[string]struct{}, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func getHostMounts(mi io.Reader)
for testing, fun, profit..
Alternative proposal for #1825