An important element of Kubernetes is that it standardizes the infrastructure co...

jacques_chester · on Dec 27, 2018

> Personally, I think the value proposition is tenuous

Multi-tenancy is a pretty compelling value proposition when you reach any kind of scale. If you're in a regulated sector, it's non-negotiable.

Relying on the cluster as the security boundary is very effective ... and very wasteful.

> Containers and Kubernetes are compelling for a variety of reasons, improving them to handle multi-tenancy is a broad challenge but I don't think the answer is to reduce the standard to what we have today (a bunch of disparate VMs).

I think the argument is that rather than the painful (and it will be very painful) and probably incomplete quest to retrofit multi-tenancy into a single-tenancy design, we can introduce multi-tenancy where it basically actually matters: at the worker node.

At first glance it's confusing to go from "one master, many nodes" to "one node pool, many masters". But it actually works better on every front. Workload efficiency goes up. Security surface area between masters becomes close to nil.

Very cheap VMs are the means to that end.

Disclosure: I work for Pivotal and this argument fits our basic doctrine of how Kubernetes ought to be used.

davidopp__ · on Dec 27, 2018

I don't think multi-tenancy has been "retrofitted" onto Kubernetes. Kubernetes was designed with multi-tenancy in mind from the very early releases -- namespaces, authn/authz (initially ABAC, later RBAC), ResourceQuota, PodSecurityPolicy, etc. New features are added over time, such as NetworkPolicy (which has been in Kubernetes for a year and a half, so perhaps not "new" anymore!), EventRateLimit, and others, but always in a principled way. And the integration of container isolation technologies like gVisor and Kata are using a standard Kubernetes extension point (the Container Runtime Interface) so I do not view this work as retrofitting.

Moreover, even today there are real public PaaSes that expose the Kubernetes API served by a multi-tenant Kubernetes cluster to mutually untrusting end-users, e.g. OpenShift Online and one of the Huawei cloud products (I forget which one). Obviously Kubernetes multi-tenancy isn't going to be secure enough today for everyone, especially folks who want an additional layer of isolation on top of cgroups/namespaces/seccomp/AppArmor/etc., but there are a lot of advantages to minimizing the number of clusters. (See my other comment in this thread about the pattern we frequently see of separate clusters for dev/test vs. staging vs. prod, possibly per region, but sharing each of those among multiple users and/or applications.)

Disclosure: I work at Google on Kubernetes and GKE.

raesene9 · on Dec 27, 2018

From a security (as opposed to workload isolation) perspective, I don't think k8s was designed with multi-tenancy in mind at all, in early versions.

Definitely I've had conversations with some of the project originators where it was clear the security boundry was intended to be cluster level in early versions.

Some of the security weaknesses in earlier versions (e.g. no AuthN on the kubelet, cluster-admin grade service tokens etc) make that clear.

Now it's obv. that secure hard multi-tenancy is a goal going forward (and I'll be very interested to see what the 3rd party audit throws up in that regard), but it is a retro-fit.

jacques_chester · on Dec 27, 2018

> I don't think multi-tenancy has been "retrofitted" onto Kubernetes. Kubernetes was designed with multi-tenancy in mind from the very early releases -- namespaces, authn/authz (initially ABAC, later RBAC), ResourceQuota, PodSecurityPolicy, etc.

My complaint is that these require assembly and are in many cases opt-in (making RBAC opt-out was a massive leap forward).

Namespaces are the lynchpin, but are globally visible. In fact an enormous amount of stuff tends to wind up visible in some fashion. And I have to go through all the different mechanisms and set them up correctly, align them correctly, to create a firmer multi-tenancy than the baseline.

Put another way, I am having to construct multi-tenancy inside multiple resources at the root level, rather than having tenancy as the root level under which those multiple resources fall.

> there are a lot of advantages to minimizing the number of clusters.

The biggest is going to be utilisation. Combining workloads pools variance, meaning you can safely run at a higher baseline load. But I think that can be achieved more effectively with virtual kubelet .

davidopp__ · on Dec 27, 2018

> The biggest is going to be utilisation. Combining workloads pools variance, meaning you can safely run at a higher baseline load.

Utilization is arguably the biggest benefit (fewer nodes if you can share nodes among users/workloads, fewer masters if you can share the control plane among users/workloads), but I wouldn't under-estimate the manageability benefit of having fewer clusters to run. Also, for applications (or application instances, e.g. in the case of a SaaS) that are short-lived, the amount of time it takes to spin up a new cluster to serve that application (instance) can cause a poor user experience; spinning up a new namespace and pod(s) in an existing multi-tenant cluster is much faster.

> But I think that can be achieved more effectively with virtual kubelet .

I think it's hard to compare virtual kubelet to something like Kata Containers, gVisor, or Firecracker. You can put almost anything at the other end of a virtual kubelet, and as others have pointed out in this thread virtual kubelet doesn't provide the full Kubelet API (and thus you can't use the full Kubernetes API against it). At a minimum I think it's important to specify what is backing the virtual kubelet, and what Kubernetes features you need, in order to compare it with isolation technologies like the others I mentioned.

Disclosure: I work at Google on Kubernetes and GKE.

rbanffy · on Dec 27, 2018

One trick I used before was to create resources and leave them unused until they are allocated, at which point I create another one to top off the pool of pre-created resources. A stopped cluster takes up disk space and nothing else and this is an easy solution to the user experience issue.

Of course, hardening multi-tenant clusters is also needed. Even if the use case requires resource partitioning, there are use cases that don't and keeping one friend from stepping on another's toes is always a good idea.

jacques_chester · on Dec 27, 2018

I'd like to understand more about your second paragraph, since it shapes some of the work I want to do in 2019. What should I be reading or looking up?

dcow · on Dec 27, 2018

Are multiple disclosures in the same thread really necessary?

amscanne · on Dec 27, 2018

I'm saying that I think the value proposition for the virtual kubelet is tenuous, not multi-tenancy as a whole.

For a single cluster, "very cheap" VMs solve some of the problems, but leave others unsolved (e.g. they prevent some hardware and kernel exploits, but lots of security issues can still hit you -- like the last two big K8s CVEs). They also sacrifice a lot of the things that make containers compelling on the floor (high efficiency and density), so I don't think they should be spun as a panecea.

You seem to be arguing that one shouldn't bother with multi-tenancy on a single cluster, which is a fine approach, but I do think that the technologies and tools to support the single cluster model are evolving. Calling it a "multi-tenancy retrofit" seems a bit FUD-y to me. Just because there are challenges doesn't mean it's not worth doing.

jacques_chester · on Dec 27, 2018

> I'm saying that I think the value proposition for the virtual kubelet is tenuous, not multi-tenancy as a whole.

I was tying them together because I see the former as an effective strategy to achieve the latter.

> Calling it a "multi-tenancy retrofit" seems a bit FUD-y to me. Just because there are challenges doesn't mean it's not worth doing.

What should I call it? It's being added retrospectively to a single-tenant design. The changes have to be correctly threaded through everything, through codebases managed by dozens of working groups, without breaking thousands of existing extensions, tools and applications.

What I expect will happen instead is that it will be better than it is now -- which is a win -- but that no complete, mandatorily-secure, top-to-bottom security boundaries will be created inside single clusters. We will still be left with lots of leaks.

Our industry is replete with folks trying to wedge the business of hypervisors and supervisors into applications and services. It's possible but always leaks and breaks and diverts enormous development bandwidth away from the core thing that is meant to be achieved. Kernels and hypervisors have privileged hardware access and decades of hardening that can't be truly replicated at the application or service level and which when imitated need to be designed in from the beginning.

I don't see that as FUD. I think it just is what it is. But I appreciate that my thinking is line with the doctrine Pivotal advances to its customers, which differs from the doctrine Red Hat and others advance (One Cluster To Rule Them All).

smarterclayton · on Dec 27, 2018

I’m not sure who at Red Hat is advocating one cluster to rule them all, but it’s just one point on the spectrum. There are lots of places where one cluster makes sense and two would be overkill - if you want to run lots of simple workloads, or have one very large scale app. But it’s equally smart to separate clusters by security domain or regulatory zone, or to create partitions to force your teams to treat clusters as fungible.

If there’s Red Hat documentation advising silly absolutes please let me know and I’ll make sure it gets fixed.

jacques_chester · on Dec 27, 2018

I don't have an example to hand, so it's obvious I went on second-hand accounts. Do you have something you'd normally point customers to when describing the tradeoffs?

For myself I see the argument for fewer clusters as about utilisation, the argument for more clusters about isolation. It's the oldest tug-of-war in computing. I think that shared node pools for multiple masters is going to be the combination that for most workloads will increase utilisation without greatly weakening isolation. I don't think multi-tenancy in the master will be as easily achieved or as effective.

oso2k · on Dec 27, 2018

In Red Hat OpenShift Consulting, we openly advise against “One cluster to rule them all” and the vast majority of our customers heed our advice. Our default delivery models support Sandbox, Nonprod, Prod cluster stand up. Some of us even support the idea that good IaC/EaC practices get our customers to where the cluster can be treated like cattle (much like pods and containers) in well-designed apps. My colleague Raffaele hinted as much when describing the problem as a matter of availability, disaster recovery and federation [0]. At least in OpenShift, multi-tenancy is a solved problem when cluster right-sizing has taken place. RBAC, node labels and selectors, EgressIP, quotas, requests and limits, multi-tenant or networkpolicy plug-ins go a long way.

[0] https://blog.openshift.com/deploying-openshift-applications-...

jacques_chester · on Dec 27, 2018

> In Red Hat OpenShift Consulting, we openly advise against “One cluster to rule them all” and the vast majority of our customers heed our advice. ... My colleague Raffaele hinted as much when describing the problem as a matter of availability, disaster recovery and federation [0].

To be honest, I should have realised this would be so.

> At least in OpenShift, multi-tenancy is a solved problem when cluster right-sizing has taken place. RBAC, node labels and selectors, EgressIP, quotas, requests and limits, multi-tenant or networkpolicy plug-ins go a long way.

Well, as you can guess, I am not convinced that this is really solved -- it looks like multiple discretionary access control mechanisms that need to be aligned properly, instead of a single mandatory access control mechanism to which other things align.

ownagefool · on Dec 27, 2018

It's also about tooling.

I've seen many clusters being sold, but with no tooling to automatically build, monitor, secure and maintain these clusters, so you've got a DevOps team playing cluster wack-a-mole.

Of course the consultancies love that because it's a bespoke layer for them to build and support, but the reality is setting up a small team to run a couple of clusters eases the job of discoverability and secops, and for many orgs is "good enough".

Still, there is room for improvement, buy I doubt it's many masters without another product on top.

jacques_chester · on Dec 27, 2018

> It's also about tooling. I've seen many clusters being sold, but with no tooling to automatically build, monitor, secure and maintain these clusters, so you've got a DevOps team playing cluster wack-a-mole.

Pivotal's doctrine of how to use Kubernetes is explicitly multi-cluster oriented, but that's because we come to the table with tooling that excels at this kind of problem: BOSH.

davidmr · on Dec 27, 2018

I agree, but I think I’ve put less thought into it than you have. Just because I tend to focus on research and low-latency infrastructure, I’ve been really happy with the container-first approach. I really like being able to deal with these processes at the Linux level instead of the VM level and being able to tweak that stuff (cpu placement and isolation, accelerators and rdma network devices, etc.) There’s a reason VMs never really took off in HPC, but I think CRI-O is really poised to change the HPC paradigm, and k8s can be really beneficial in some business applications of HPC.

I definitely understand and agree that multitenancy is super important, but it would be a shame to agree that the bare metal performance is an okay sacrifice.

serverascode · on Dec 27, 2018

I agree, I'm not sure the virtual kubelet concept is a great idea overall. It sounds good on the surface, but most of the time these kind of abstractions do more harm then good, as far as I'm concerned. But, I could be wrong. :)

acje · on Dec 27, 2018

What we need is an open technology stack such that we can get full standardization and integration with k8s and other systems. We might get that with RISC-V.