Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support alternate runtime for host privileged operations #367

Merged
merged 4 commits into from
Mar 4, 2017

Conversation

sameo
Copy link

@sameo sameo commented Feb 21, 2017

Some CRI-O runtimes (like cc-oci-runtime) do not support many host privileged operations like e.g. giving access to the host namespaces or running fully privileged containers (with access to all host devices). Those runtimes usually provide a higher level of container security by e.g. running container workloads within virtual machines and therefore running host privileged containers with them makes little sense.

This pull request tries to overcome that problem by allowing ocid users to define a host privileged capable runtime path in addition to the default runtime path. When ocid gets a request for creating a container or a sandbox with either the privileged flag set or access to at least one of the host namespaces (PID, IPC or networking), it will check if a host privileged runtime is defined and use it if it is. It will obviously use the default runtime if the host privileged one is not defined.

This PR only checks for the pod security context and each container within a given pod will inherit its pod privileged flags. In other words, that means we will run the privileged runtime in either one of those 2 cases:

  1. The sandbox supports running at least one privileged container.
  2. The sandbox requires access to either one of the host PID, IPC or networking namespace.
Note

I realize this might be fixed with CRI multiple runtimes support. If/when this happens, we should be able to remove part of this PR code.

cc @feiskyer @mcastelino

@rh-atomic-bot
Copy link

0/0 passed on RHEL - Passed.
0/0 passed on CentOS - Passed.
0/0 passed on Fedora - Passed.
Log - https://aos-ci.s3.amazonaws.com/kubernetes-incubator/cri-o/crio-integration-tests-prs/168/fullresults.txt

nsPath := fmt.Sprintf("/proc/%d/ns/%s", podInfraState.Pid, nsFile)
if err := specgen.AddOrReplaceLinuxNamespace((string)(nsType), nsPath); err != nil {
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. Different containers in a Pod do not share mount, pid or cgroup namespaces. And I'm fairly sure they don't share uts or user namespaces either.

What is the reason for this change?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That nsPath looks like it should use filepath.Join instead? Also string(nsType) instead of (string)(nsType)? (I didn't even realize the latter is even valid Go code?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(string) -> string is why it works.

But that's not the main point I'm making. I'm saying that the logic behind the change doesn't make sense to me (different containers in a sandbox do not share all of their namespaces).

Copy link
Contributor

@mikebrow mikebrow Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the direction is to by default have containers in a pod share network, PID and IPC namespaces.

kubernetes/community#207

Caught me by surprise the other day as well. But it's switchable. Gives pods a higher level purpose I suppose.

@cyphar
Copy link
Contributor

cyphar commented Feb 21, 2017

I'm a bit confused why this logic has to live in cri-o. Is there a reason why alternative runtimes can't do one of the following:

  1. Look at the config.json that they've been given, and then figure out whether they have to run in "privileged" or "unprivileged" mode (effectively a wrapper).

  2. Implement this "is this operation privileged" logic in cri-o, but expose it as an annotation rather than including two different runtime paths inside cri-o.

@sameo
Copy link
Author

sameo commented Feb 22, 2017

@cyphar

But that's not the main point I'm making. I'm saying that the logic behind the change doesn't make sense to me (different containers in a sandbox do not share all of their namespaces).

You're correct, the default k8s behavior is to share the PID, IPC and networking namespace, I'll fix that patch.

@sameo sameo force-pushed the topic/host-privileged-runtime branch from 94408d0 to 09415b9 Compare February 22, 2017 00:19
@sameo
Copy link
Author

sameo commented Feb 22, 2017

@cyphar

I'm a bit confused why this logic has to live in cri-o. Is there a reason why alternative runtimes can't do one of the following:

  1. Look at the config.json that they've been given, and then figure out whether they have to run in "privileged" or "unprivileged" mode (effectively a wrapper).

  2. Implement this "is this operation privileged" logic in cri-o, but expose it as an annotation rather than including two different runtime paths inside cri-o.

Good points. Solution 1 is something I thought about. However, running privileged containers with any hypervisor based runtime is not possible (You can't see or change the host network namespace for example) so this would basically mean calling another runtime (e.g. runc) from those runtimes.
This seems a bit wrong to me and to some extend the same logic could be applied to runc for supporting multiple runtimes: Depending on the potential annotations it'd get from kubelet/CRI it would fork different runtimes when it does not support the incoming annotation (In case we get a declarative, intent annotation like e.g. kubernetes/untrusted_workload=true).

@feiskyer
Copy link
Contributor

+1 for the direction.

Due to CRI not clearly defining the pod and container security context ordering (Right now, the container security context can be more permissive than the pod one) this PR checks for both the pod and container security context although ideally it should only check for the pod one. I'm filling a k8s issue for this.

I think we should only switch to privileged runtime in these two conditions:

  • privileged sandbox and its containers (LinuxSandboxSecurityContext.Priviledged = true)
  • HostNetwork, HostIPC and HostPid sandbox and its containers (LinuxSandboxSecurityContext. NamespaceOption is set)

@sameo
Copy link
Author

sameo commented Feb 22, 2017

@feiskyer

I think we should only switch to privileged runtime in these two conditions:

  • privileged sandbox and its containers (LinuxSandboxSecurityContext.Priviledged = true)
  • HostNetwork, HostIPC and HostPid sandbox and its containers (LinuxSandboxSecurityContext. NamespaceOption is set)

Ok, then we assume that the container security context will always be the same as the pod one. As you commented on k8s issue 41848, kuberuntime currently sets all containers security contexts to be the same as their pod.

Right now this PR checks for both the containers and the pod security contexts, I'll fix the code to follow the assumption that they will always be identical.

@sameo
Copy link
Author

sameo commented Feb 22, 2017

@feiskyer

Ok, then we assume that the container security context will always be the same as the pod one. As you commented on k8s issue 41848, kuberuntime currently sets all containers security contexts to be the same as their pod.

After further private discussions, this is incorrect. The security context will not always be the same, only the HostNetwork field will.

And as you described, we should run a privileged runtime for all containers if sandbox.privileged is set to true. In that case, the pod could contain a mix of privileged and unprivileged containers but we can't mix them together.

With that understanding, I'll fix my PR.

@sameo sameo force-pushed the topic/host-privileged-runtime branch 3 times, most recently from f1583a9 to 404de17 Compare February 22, 2017 18:54
@sameo
Copy link
Author

sameo commented Feb 22, 2017

@feiskyer PR updated according to the latest comments.

@sameo sameo force-pushed the topic/host-privileged-runtime branch from 404de17 to 6143f04 Compare February 23, 2017 08:25
@feiskyer
Copy link
Contributor

@sameo LGTM.

cc/ @cyphar @mrunalp PTAL

server/server.go Outdated
@@ -50,6 +52,64 @@ type Server struct {
appArmorProfile string
}

func ociPrivileged(spec *rspec.Spec) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment here? Also, shouldn't we also check for uid = 0?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't --privileged different from just running as uid 0?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrunalp I added some comments to the routine, hopefully they make sense.

if err := specgen.AddOrReplaceLinuxNamespace("ipc", ipcNsPath); err != nil {
return nil, err
for nsType, nsFile := range map[rspec.NamespaceType]string{
rspec.PIDNamespace: "pid",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can take this out for now as k8s is still discussing viability and it may be pushed to next release.

@mrunalp
Copy link
Member

mrunalp commented Feb 28, 2017

How do we handle the case where sandbox privileged is not triggered but a container is privileged? Isn't it too late?

@mcastelino
Copy link

@mrunalp @sameo it would be good if we could specify the high water mark for privileges which can be inferred from the POD definition. Having such a high water mark will prevent the POD from being in a partially created state.

@feiskyer
Copy link
Contributor

feiskyer commented Mar 1, 2017

How do we handle the case where sandbox privileged is not triggered but a container is privileged? Isn't it too late?

@mrunalp Kubelet ensures sandbox privileged is set when at least one of containers belonging to it is privileged.

@sameo sameo force-pushed the topic/host-privileged-runtime branch 2 times, most recently from 42bc143 to 5dce9d6 Compare March 1, 2017 14:33
@@ -34,24 +34,26 @@ const (
)

// New creates a new Runtime with options provided
func New(runtimePath string, conmonPath string, conmonEnv []string, cgroupManager string) (*Runtime, error) {
func New(runtimePath string, runtimeHostPrivilegedPath string, conmonPath string, conmonEnv []string, cgroupManager string) (*Runtime, error) {
Copy link

@jawnsy jawnsy Mar 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design question: would it make sense to pass in a struct with configuration instead? Every time a new parameter is added (as today), the method and all its callers have to be adjusted, and it's annoying for optional parameters (because all the call sites will need to be changed to pass in zero values like "")

Have we considered passing in configuration as a special struct type, like:

type RuntimeConfig struct {
  Name string
  Path string
  ...
}

func NewRuntime(config *RuntimeConfig) (*Runtime, error) {
  r := &Runtime{
    config: config,
  }
  return r, nil
}

(sidenote, what's the point in error as a second return type, if errors can never occur?)

@mrunalp
Copy link
Member

mrunalp commented Mar 2, 2017

@mrunalp Kubelet ensures sandbox privileged is set when at least one of containers belonging to it is privileged.

@feiskyer @sameo Does this mean that we just need to check for sandbox privileged flag and there is no need to look at the container config?

@feiskyer
Copy link
Contributor

feiskyer commented Mar 2, 2017

Does this mean that we just need to check for sandbox privileged flag and there is no need to look at the container config?

I think so. We should only check sandbox config to decide the privileged runtime.

Not all runtimes are able to handle some of the kubelet
security context options, in particular the ones granting
host privileges to containers.

By adding a host privileged runtime path configuration, we
allow ocid to use a different runtime for host privileged
operations like e.g. host namespaces access.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Samuel Ortiz added 3 commits March 3, 2017 17:22
We add a privileged flag to the container and sandbox structures
and can now select the appropriate runtime path for any container
operations depending on that flag.

Here again, the default runtime will be used for non privileged
containers and for privileged ones in case there are no privileged
runtime defined.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The sandbox privileged flag is set to true only if either the
pod configuration privileged flag is set to true or when any
of the pod namespaces are the host ones.

A container inherit its privileged flag from its sandbox, and
will be run by the privileged runtime only if it's set to true.
In other words, the privileged runtime (when defined) will be
when one of the below conditions is true:

- The sandbox will be asked to run at least one privileged container.
- The sandbox requires access to either the host IPC or networking
  namespaces.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
By factorizing the bind mounts generation code.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
@sameo sameo force-pushed the topic/host-privileged-runtime branch from 5dce9d6 to f7eee71 Compare March 3, 2017 18:09
@sameo
Copy link
Author

sameo commented Mar 3, 2017

@mrunalp Simplified the code through a pod annotation, as discussed during the ocid meeting.

@sameo sameo changed the title [RFC] Support alternate runtime for host privileged operations Support alternate runtime for host privileged operations Mar 3, 2017
@mrunalp
Copy link
Member

mrunalp commented Mar 4, 2017

LGTM

1 similar comment
@feiskyer
Copy link
Contributor

feiskyer commented Mar 4, 2017

LGTM

@feiskyer feiskyer merged commit 3195f45 into cri-o:master Mar 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants