Featured image of post Kubernetes Cluster Architecture

Kubernetes Cluster Architecture

Top to Bottom

我的 K8S 学习之旅 03 [v1.22]


Kubernetes runs your workload by placing containers into Pods to run on Nodes. A node may be a virtual or physical machine, depending on the cluster. Each node is managed by the control plane and contains the services necessary to run Pods.

  1. kubelet on a node self-registers to the control plane
  2. manually add a node object

After create a Node object, or the kubelet on a node self-registers, the control plane checks whether the new Node object is valid.

  "kind": "Node",
  "apiVersion": "v1",
  "metadata": {
    "name": "", // uniqueness for identification
    "labels": {
      "name": "my-first-k8s-node"

Node Status

kubectl describe node <your-node-name>


The usage of these fields varies depending on your cloud provider or base metal configuration.

  • HostName: The hostname as reported by the node’s kernel. Can be overridden via the kubelet --hostname-override parameter.
  • ExternalIP: Typically the IP address of the node that is externally routable (available from outside the cluster).
  • InternalIP: Typically the IP address of the node that is routable only within the cluster.


The conditions field describes the status of all running nodes.

  • Ready – if the node is healthy and ready to accept pods.
  • DiskPressure – if the disk capacity if low
  • MemoryPressure – if node memory is low
  • PIDPressure – if there are too many processes on the node
  • NetworkUnavailable – if the network for the node is not correctly configured
"conditions": [
    "type": "Ready",
    "status": "True",
    "reason": "KubeletReady",
    "message": "kubelet is posting ready status",
    "lastHeartbeatTime": "2019-06-05T18:38:35Z",
    "lastTransitionTime": "2019-06-05T11:41:27Z"

Capacity & allocatable:

Describes the resources available on the node: CPU, memory, and the maximum number of pods that can be scheduled onto the node.

The fields in the capacity block indicate the total amount of resources that a Node has. The allocatable block indicates the amount of resources on a Node that is available to be consumed by normal Pods.


Describes general information about the node, such as kernel version, Kubernetes version (kubelet and kube-proxy version), Docker version (if used), and OS name.


Heartbeats, sent by Kubernetes nodes, help your cluster determine the availability of each node, and to take action when failures are detected.

  • updates to the .status of a Node
  • Lease objects within the kube-node-lease namespace. Each Node has an associated Lease object.

Node Controller:

The node controller is a Kubernetes control plane component that manages various aspects of nodes. The node controller has multiple roles in a node’s life.

The first is assigning a CIDR block to the node when it is registered (if CIDR assignment is turned on).

The second is keeping the node controller’s internal list of nodes up to date with the cloud provider’s list of available machines. When running in a cloud environment and whenever a node is unhealthy, the node controller asks the cloud provider if the VM for that node is still available. If not, the node controller deletes the node from its list of nodes.

The third is monitoring the nodes' health. The node controller is responsible for:

  • In the case that a node becomes unreachable, updating the NodeReady condition of within the Node’s .status. In this case the node controller sets the NodeReady condition to ConditionUnknown.
  • If a node remains unreachable: triggering API-initiated eviction for all of the Pods on the unreachable node. By default, the node controller waits 5 minutes between marking the node as ConditionUnknown and submitting the first eviction request.

Rate limits on eviction:

In most cases, the node controller limits the eviction rate to --node-eviction-rate (default 0.1) per second, meaning it won’t evict pods from more than 1 node per 10 seconds.

The node eviction behavior changes when a node in a given availability zone becomes unhealthy. The node controller checks what percentage of nodes in the zone are unhealthy (NodeReady condition is ConditionUnknown or ConditionFalse) at the same time:

  • If the fraction of unhealthy nodes is at least –unhealthy-zone-threshold (default 0.55), then the eviction rate is reduced.
  • If the cluster is small (i.e. has less than or equal to –large-cluster-size-threshold nodes - default 50), then evictions are stopped.
  • Otherwise, the eviction rate is reduced to –secondary-node-eviction-rate (default 0.01) per second.

Resource capacity tracking:

Node objects track information about the Node’s resource capacity: for example, the amount of memory available and the number of CPUs. Nodes that self register report their capacity during registration. If you manually add a Node, then you need to set the node’s capacity information when you add it.

Graceful node shutdown:

The kubelet attempts to detect node system shutdown and terminates pods running on the node.

Kubelet ensures that pods follow the normal pod termination process during the node shutdown.

The Graceful node shutdown feature depends on systemd since it takes advantage of systemd inhibitor locks to delay the node shutdown with a given duration.

Control Plane-Node Communication

Node to Control Plane

All API usage from nodes (or the pods they run) terminates at the apiserver. None of the other control plane components are designed to expose remote services. The apiserver is configured to listen for remote connections on a secure HTTPS port (typically 443) with one or more forms of client authentication enabled. One or more forms of authorization should be enabled, especially if anonymous requests or service account tokens are allowed.

Nodes should be provisioned with the public root certificate for the cluster such that they can connect securely to the apiserver along with valid client credentials.

Pods that wish to connect to the apiserver can do so securely by leveraging a service account so that Kubernetes will automatically inject the public root certificate and a valid bearer token into the pod when it is instantiated. The kubernetes service (in default namespace) is configured with a virtual IP address that is redirected (via kube-proxy) to the HTTPS endpoint on the apiserver.

Control Plane to Node

apiserver to kubelet:

  • Fetching logs for pods
  • Attaching (through kubectl) to running pods
  • Providing the kubelet’s port-forwarding functionality

apiserver to nodes, pods and services:

The connections from the apiserver to a node, pod, or service default to plain HTTP connections and are therefore neither authenticated nor encrypted.

Cloud Controller Manager

The cloud-controller-manager is a Kubernetes control plane component that embeds cloud-specific control logic. The cloud controller manager lets you link your cluster into your cloud provider’s API, and separates out the components that interact with that cloud platform from components that only interact with your cluster.

Node Controller

The node controller is responsible for creating Node objects when new servers are created in your cloud infrastructure. The node controller obtains information about the hosts running inside your tenancy with the cloud provider.

  1. Initialize a Node object for each server that the controller discovers through the cloud provider API.
  2. Annotating and labelling the Node object with cloud-specific information, such as the region the node is deployed into and the resources (CPU, memory, etc) that it has available.
  3. Obtain the node’s hostname and network addresses.
  4. Verifying the node’s health. In case a node becomes unresponsive, this controller checks with your cloud provider’s API to see if the server has been deactivated / deleted / terminated. If the node has been deleted from the cloud, the controller deletes the Node object from your Kubernetes cluster.

Route Controller

The route controller is responsible for configuring routes in the cloud appropriately so that containers on different nodes in your Kubernetes cluster can communicate with each other.

Service Controller

Services integrate with cloud infrastructure components such as managed load balancers, IP addresses, network packet filtering, and target health checking. The service controller interacts with your cloud provider’s APIs to set up load balancers and other infrastructure components when you declare a Service resource that requires them.

Garbage Collection

Garbage collection is a collective term for the various mechanisms Kubernetes uses to clean up cluster resources.

  • Failed Pods
  • Completed Jobs
  • Objects without owner references
  • Unused containers and container images
  • Dynamically provisioned PersistentVolumes with a StorageClass reclaim policy of Delete
  • Stale or expired CertificateSigningRequests
  • Nodes deleted in the following scenarios:
    • On a cloud when the cluster uses a cloud controller manager
    • On-premises when the cluster uses an addon similar to a cloud controller manager
  • Node Lease objects

The kubelet performs garbage collection on unused images every five minutes and on unused containers every minute. You should avoid using external garbage collection tools, as these can break the kubelet behavior and remove containers that should exist.

Owners and dependents

Many objects in Kubernetes link to each other through owner references. Owner references tell the control plane which objects are dependent on others. Kubernetes uses owner references to give the control plane, and other API clients, the opportunity to clean up related resources before deleting an object. In most cases, Kubernetes manages owner references automatically.

Cascading deletion

Kubernetes checks for and deletes objects that no longer have owner references, like the pods left behind when you delete a ReplicaSet. When you delete an object, you can control whether Kubernetes deletes the object’s dependents automatically, in a process called cascading deletion.

Garbage collection of unused containers and images

The kubelet performs garbage collection on unused images every five minutes and on unused containers every minute.


Kubernetes Cluster Architecture

The older I get, the more I realize that most of life is a matter of what we pay attention to, of what we attend to [with focus].
Built with Hugo
Theme Stack designed by Jimmy