In the last post on Kubernetes I rambled about the different components that make up a Kubernetes cluster. In this (shorter) post, I'm going to ramble about the different constructs available to the developer attempting to deploy a container. This article is written from the perspective of Kubernetes 1.20.
Pods are the smallest logical object at play. Fundamentally, a pod is a thing that contains one or more containers. The containers in a pod are isolated together. Every container in a pod is equal - if one container is unhealthy, the pod is unhealthy. There's quite a few guides that use the term ‘side-car’ seemingly oblivious to that fact.
Containers in a pod do not share a file-system, or a process list, or awareness of the state of the other container. They are however always scheduled on the same node (allocation to a node is done by pod, not by container), share the same constraints, share an address, and also (importantly) share a loopback network interface. The classic example where you would want to utilise this loopback would be a pod with two containers, one of which is MariaDB, accessible only to the other container on 127.0.0.1:3306.
Pods are not an object you want to create by hand in most circumstances. They bring with them very little logic, and they should be considered ephemeral. When they die, they're gone. Thusly, if you're looking for a pod to always exist, something else needs to be responsible for ensuring that.
Pods are referenced by identity and namespace. I wont cover namespaces here, as what is namespaced and what is not namespaced is not a cut and dry topic. I'm using identity rather than name, as it is more accurate - but we're really just talking about how you reference the pod with kubectl, and what hostname is presented inside the containers in the pod. An example of an identity is something like my-cool-pod-hrxle2n. That, in additional to namespace, constitutes part of an A record by which you can find that specific pod's address.
A deployment is an object that contains a pod template and a count of replicas. The pod template is, as it sounds, a template for a pod object. The count of replicas refers to the number of pods to be created from that template.
As deployments exist above pods (in a logical sense), the failure of a pod can be picked up by the deployment controller (see part 1) and handled in line with the instructions in the deployment object. You have four pods, one pod dies, your deployment object specifies there should be four replicas, so it creates a new pod to replace the dead pod.
Deployments are not stateful. When a deployment creates a pod, it will create a new pod with a new identity and potentially a new address. This pod may run on a different node, or it may run on the same node. There are no guarantees of consistency in these terms with a deployment.
When the pod template in a deployment object is changed, the controller knows to replace the running pods with new pods matching that new template. The default behaviour of a deployment is to spin up new pods and then delete the old pods, one at a time.
Bottom line: Use a deployment unless you know the rescheduling of your application onto a different node, with a different identity, would be a problem. Updates are not disruptive by default.
StatefulSets exist to provide the consistency lacking above. In practice, a StatefulSet is a deployment that knows to maintain pod identity. On top of that, it also makes an effort when creating pods to see them scheduled on the same node. I say effort as I'm not entirely certain this is absolute - when I started using Kubernetes, a statefulset pod on a dead node would be rescheduled elsewhere. That appears to not be the case, and the pod will remain in a failed state until the node resumes operation, or the node is removed, or the pod is force deleted. But I haven't tested it myself.
Whereas a deployment would generate pods with identities like “my-cool-pod-jx1gsk1n”, a stateful set generates pods with predictable identities like “my-cool-pod-0”, “my-cool-pod-1”, and so on. The number at the end is the ordinal, and increments in line with the number of replicas specified in the object.
This sort of statefulness is explicitly useful when dealing with services like etcd, which needs a persistent data store on top of a persistent identity.
When the pod template in a statefulset is changed, much like a deployment, the old pods are killed. It is impossible to have two pods with the same identity running at the same time, so the old pod is deleted first, after which the new pod is started. This downtime is non-negotiable, but can be managed and mitigated.
Bottom line: Use a stateful set when you need an instance of your application to have an identity that will be maintained between restarts. Updates are disruptive by default.
DaemonSets are the last object of concern. Speaking loosely, if you were to pin the number of replicas in a deployment to the number of nodes in the cluster (or a specific segment of nodes in the cluster), you'd have a daemonset.
In practice, the common usage of a daemonset is to do things on the node in question. If you read part 1, you'll already know the example of node-proxy, which manages iptables rules in line with services. You are by no means restricted to this - if you want to run a pod on every node labelled “database”, and add additional pods as new nodes with that label are added, a daemonset will handle that.
DaemonSets by default act like deployments, in that replacement pods are brought up before their predecessor is killed. However it's worth knowing that the scheduling of DaemonSet pods isn't normal, and pods in a daemonset dont follow the flow of states that apply to regular pods (there's no ‘Pending’ state, as the pod is always allocated).
Bottom line: Unless you're working at scale, I'd just use a deployment configured to run a number of replicas equivalent to your node count, and ensure there's one per node through either a toleration or a pod topology spread constraint (see you in part 3).