Skip to content

Cluster API External CA Proof of Concept

Published:

Description

This project tests a patch for Cluster API v1.8.8 that enables kubeadm external CA mode for Cluster API Docker infrastructure clusters.

The repository compares two flows:

The goal is not just to create a cluster. The project validates that the cluster reaches HA shape, that the CA private key is absent where it should be absent, that certificate issuer lineage matches the external CA, and that control-plane and worker certificates are regenerated through the expected signing path after rerolls.

Source code is available on GitHub.

Table of contents

Open Table of contents

Idea

Cluster API normally works well with kubeadm’s self-signed certificate flow: CAPI creates or manages the cluster CA material, kubeadm consumes it, and new machines join using that bootstrap state. That is convenient, but it also means the cluster CA private key exists inside the CAPI-managed secret flow and on control-plane nodes.

This proof of concept tests a different model:

In short, the project builds a reproducible local test harness for answering this question: can Cluster API create and roll a kubeadm workload cluster while keeping the Kubernetes CA private key outside the normal CAPI/kubeadm self-signed path?

Execution flow

For both modes, the pipeline starts by recreating the kind management cluster, building CAPI images, installing CAPI providers, deploying bootstrap step-ca, rendering workload manifests, provisioning the workload cluster, writing the workload kubeconfig, and installing Cilium.

The external-CA flow adds the certificate-specific steps:

  1. Generate bootstrap PKI from the management step-ca bundle.
  2. Create the external-CA bootstrap secrets used by CAPI.
  3. Start the workload cluster with one control-plane node.
  4. Scale workers to 3 while still using bootstrap-signed PKI.
  5. Deploy step-ca into the workload cluster.
  6. Create the workload signer secret containing step-ca provisioner material and signer scripts.
  7. Patch the KubeadmControlPlane for signer mode and scale the control plane to 3.
  8. Replace the oldest control-plane machine so new control-plane leaf certificates are issued through the workload signer.
  9. Patch the worker bootstrap template for signer-based kubelet certificate issuance.
  10. Reroll workers so kubelet client certificates are signed by the external CA.

The final expected external-CA shape is:

control-plane replicas: 3
worker replicas:        3
total workload nodes:   6

What was implemented

The project is built around a scripted Cluster API test harness:

The CAPI patch changes certificate handling so external-CA mode looks up required certificate material instead of generating CA private keys. It also waits for a pre-created kubeconfig secret instead of creating one with an internally generated CA.

Prerequisites

The local prerequisite check expects these tools:

kubectl
kind
docker
openssl
go
helm
git
make
jq

It also checks that the Docker daemon is reachable:

make prereqs

The default Cluster API version is pinned through the Makefile:

CAPI_VERSION ?= v1.8.8

The workload cluster templates use Kubernetes v1.29.2.

Run locally

Run the upstream self-signed baseline:

make test-self-signed-ca

Run the patched external-CA flow:

make test-external-ca

The test targets run:

clean -> setup -> validate

Setup and validation can also be run separately:

make setup-self-signed-ca
make validate-self-signed-ca

make setup-external-ca
make validate-external-ca

The patch can be checked independently against upstream CAPI:

make patch-check

Shell syntax checks are available for the project scripts:

make lint-scripts

Validation

Self-signed baseline

The self-signed mode validates upstream behavior:

This gives the project a baseline for how upstream CAPI and kubeadm behave without the external-CA patch.

External CA flow

The external-CA mode validates the patched behavior:

The control-plane key uniqueness check is important because it verifies that rerolled nodes received fresh per-node leaf keys instead of reusing static bootstrap key material.

Debugging

Management cluster checks:

kubectl --kubeconfig out/mgmt/mgmt.kubeconfig get pods -A | grep -E 'capi-|capd-|cert-manager|step-ca' || true
kubectl --kubeconfig out/mgmt/mgmt.kubeconfig -n default get cluster
kubectl --kubeconfig out/mgmt/mgmt.kubeconfig -n default get kcp
kubectl --kubeconfig out/mgmt/mgmt.kubeconfig -n default get md
kubectl --kubeconfig out/mgmt/mgmt.kubeconfig -n default get machine

Workload cluster checks:

kubectl --kubeconfig out/workload/kubeconfig get nodes -o wide
kubectl --kubeconfig out/workload/kubeconfig get pods -A
kubectl --kubeconfig out/workload/kubeconfig get csr

Node-level certificate checks:

CP_NODE="$(kubectl --kubeconfig out/workload/kubeconfig get nodes -l node-role.kubernetes.io/control-plane -o jsonpath='{.items[0].metadata.name}')"

kubectl --kubeconfig out/workload/kubeconfig debug "node/$CP_NODE" --image=busybox:1.36 --quiet -- \
  chroot /host ls -l /etc/kubernetes/pki/ca.key

kubectl --kubeconfig out/workload/kubeconfig debug "node/$CP_NODE" --image=busybox:1.36 --quiet -- \
  chroot /host openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -issuer -subject

Validation artifacts are written under out/workload/, including captured CA secrets, node lists, Cilium pod state, worker kubelet certificate info, control-plane key hashes, and API server certificate chain output.

Cleanup

Remove the kind management cluster and generated artifacts:

make clean

The cleanup target deletes the capi-mgmt kind cluster and removes the out/ directory.