Description
This project tests a patch for Cluster API v1.8.8 that enables kubeadm external CA mode for Cluster API Docker infrastructure clusters.
The repository compares two flows:
self-signed: upstream Cluster API behavior, where CAPI/kubeadm owns the cluster CA private key.external-ca: patched CAPI behavior, where bootstrap PKI is pre-generated and the workload cluster is rerolled to use an external signer flow.
The goal is not just to create a cluster. The project validates that the cluster reaches HA shape, that the CA private key is absent where it should be absent, that certificate issuer lineage matches the external CA, and that control-plane and worker certificates are regenerated through the expected signing path after rerolls.
Source code is available on GitHub.
Table of contents
Open Table of contents
Idea
Cluster API normally works well with kubeadm’s self-signed certificate flow: CAPI creates or manages the cluster CA material, kubeadm consumes it, and new machines join using that bootstrap state. That is convenient, but it also means the cluster CA private key exists inside the CAPI-managed secret flow and on control-plane nodes.
This proof of concept tests a different model:
- CAPI should be able to bootstrap a workload cluster when the CA certificate exists but the CA private key is not available to CAPI.
- Initial control-plane material can be pre-generated from an external CA.
- After the workload cluster is alive, a signer inside that workload cluster can issue fresh per-node certificates.
- Control-plane and worker rerolls should replace bootstrap material with certificates issued through the external signer flow.
In short, the project builds a reproducible local test harness for answering this question: can Cluster API create and roll a kubeadm workload cluster while keeping the Kubernetes CA private key outside the normal CAPI/kubeadm self-signed path?
Execution flow
For both modes, the pipeline starts by recreating the kind management cluster, building CAPI images, installing CAPI providers, deploying bootstrap step-ca, rendering workload manifests, provisioning the workload cluster, writing the workload kubeconfig, and installing Cilium.
The external-CA flow adds the certificate-specific steps:
- Generate bootstrap PKI from the management step-ca bundle.
- Create the external-CA bootstrap secrets used by CAPI.
- Start the workload cluster with one control-plane node.
- Scale workers to
3while still using bootstrap-signed PKI. - Deploy step-ca into the workload cluster.
- Create the workload signer secret containing step-ca provisioner material and signer scripts.
- Patch the KubeadmControlPlane for signer mode and scale the control plane to
3. - Replace the oldest control-plane machine so new control-plane leaf certificates are issued through the workload signer.
- Patch the worker bootstrap template for signer-based kubelet certificate issuance.
- Reroll workers so kubelet client certificates are signed by the external CA.
The final expected external-CA shape is:
control-plane replicas: 3
worker replicas: 3
total workload nodes: 6
What was implemented
The project is built around a scripted Cluster API test harness:
- a kind management cluster named
capi-mgmtwith one control-plane and one worker node; - Cluster API
v1.8.8image build and install automation; - a patch that adds
externalCA: trueto kubeadm bootstrap and control-plane specs; - generated Cluster API manifests for self-signed and external-CA modes;
- bootstrap PKI generation for Kubernetes, front-proxy, etcd, service-account, admin, scheduler, controller-manager, and kubelet material;
- a script that pre-creates the cryptographic bootstrap material and uses it to populate the CAPI secrets needed by the external-CA flow;
- step-ca deployment in the management cluster for bootstrap signing;
- step-ca deployment in the workload cluster for post-bootstrap signing;
- control-plane reroll logic for replacing bootstrap-issued node certificates;
- worker reroll logic for issuing kubelet client certificates through the external CA;
- validation scripts that inspect Kubernetes secrets, workload nodes, node-local certificate files, certificate issuers, and key uniqueness.
The CAPI patch changes certificate handling so external-CA mode looks up required certificate material instead of generating CA private keys. It also waits for a pre-created kubeconfig secret instead of creating one with an internally generated CA.
Prerequisites
The local prerequisite check expects these tools:
kubectl
kind
docker
openssl
go
helm
git
make
jq
It also checks that the Docker daemon is reachable:
make prereqs
The default Cluster API version is pinned through the Makefile:
CAPI_VERSION ?= v1.8.8
The workload cluster templates use Kubernetes v1.29.2.
Run locally
Run the upstream self-signed baseline:
make test-self-signed-ca
Run the patched external-CA flow:
make test-external-ca
The test targets run:
clean -> setup -> validate
Setup and validation can also be run separately:
make setup-self-signed-ca
make validate-self-signed-ca
make setup-external-ca
make validate-external-ca
The patch can be checked independently against upstream CAPI:
make patch-check
Shell syntax checks are available for the project scripts:
make lint-scripts
Validation
Self-signed baseline
The self-signed mode validates upstream behavior:
<cluster>-caexists and containstls.key;- the workload API server is reachable;
- the workload cluster reaches
3control-plane and3worker replicas; - all six workload nodes become
Ready; - Cilium pods become healthy;
/etc/kubernetes/pki/ca.keyexists on a control-plane node;- the API server certificate chain is captured for inspection.
This gives the project a baseline for how upstream CAPI and kubeadm behave without the external-CA patch.
External CA flow
The external-CA mode validates the patched behavior:
<cluster>-caexists but does not containtls.key;- the workload API server is reachable;
- the workload cluster reaches
3control-plane and3worker replicas; - all six workload nodes become
Ready; - Cilium pods become healthy;
/etc/kubernetes/pki/ca.keyis absent on control-plane nodes;- the Cluster API CA certificate fingerprint matches the generated source CA certificate;
- the API server certificate issuer matches the external cluster CA subject;
- worker kubelet client certificates are issued by the external CA;
- worker kubelet client certificate subjects match
CN=system:node:<node>andO=system:nodes; - control-plane
apiserver.keyhashes are unique after reroll; - control-plane
etcd/peer.keyhashes are unique after reroll; - the API server certificate chain is captured for inspection.
The control-plane key uniqueness check is important because it verifies that rerolled nodes received fresh per-node leaf keys instead of reusing static bootstrap key material.
Debugging
Management cluster checks:
kubectl --kubeconfig out/mgmt/mgmt.kubeconfig get pods -A | grep -E 'capi-|capd-|cert-manager|step-ca' || true
kubectl --kubeconfig out/mgmt/mgmt.kubeconfig -n default get cluster
kubectl --kubeconfig out/mgmt/mgmt.kubeconfig -n default get kcp
kubectl --kubeconfig out/mgmt/mgmt.kubeconfig -n default get md
kubectl --kubeconfig out/mgmt/mgmt.kubeconfig -n default get machine
Workload cluster checks:
kubectl --kubeconfig out/workload/kubeconfig get nodes -o wide
kubectl --kubeconfig out/workload/kubeconfig get pods -A
kubectl --kubeconfig out/workload/kubeconfig get csr
Node-level certificate checks:
CP_NODE="$(kubectl --kubeconfig out/workload/kubeconfig get nodes -l node-role.kubernetes.io/control-plane -o jsonpath='{.items[0].metadata.name}')"
kubectl --kubeconfig out/workload/kubeconfig debug "node/$CP_NODE" --image=busybox:1.36 --quiet -- \
chroot /host ls -l /etc/kubernetes/pki/ca.key
kubectl --kubeconfig out/workload/kubeconfig debug "node/$CP_NODE" --image=busybox:1.36 --quiet -- \
chroot /host openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -issuer -subject
Validation artifacts are written under out/workload/, including captured CA secrets, node lists, Cilium pod state, worker kubelet certificate info, control-plane key hashes, and API server certificate chain output.
Cleanup
Remove the kind management cluster and generated artifacts:
make clean
The cleanup target deletes the capi-mgmt kind cluster and removes the out/ directory.