Self learning Kubernetes: Automating SSL cert retrieval and updates (Pt. 5)

Self learning Kubernetes: Automating SSL cert retrieval and updates (Pt. 5)
Automated SSLs for my services

TLDR: Cert-manager automates SSL certificate issuance, renewal, and management for Kubernetes, making it easy to secure Ingress resources with valid certificates from providers like Let’s Encrypt.

Introduction

In the last post I went over setting up a real SSL certificate that could be used by services running on Kubernetes. In this post, I'll be going over how I set up my SSL certs to be auto-generated and renewed using cert-manager with Traefik.

What is cert-manager?

cert-manager is a powerful and extensible X.509 certificate controller for Kubernetes and OpenShift workloads. It will obtain certificates from a variety of Issuers, both popular public Issuers as well as private Issuers, and ensure the certificates are valid and up-to-date, and will attempt to renew certificates at a configured time before expiry. - cert-manager.io
Source: cert-manager.io

Cert-manager automates the retrieval, renewal, and setup of certificates used within Kubernetes. My use case involves only supplying the ssl certs for the ingress resources within my cluster though there are other use cases that cert-manager can handle such as pod to pod mTLS communication and service mesh mTLS communication.

How does cert-manager work with LetsEncrypt?

Cert-manager comes with its own custom resource definitions for Kubernetes. Of those are the Issuer and ClusterIssuer resource types. If you recall from my last post, I had to explicitly create a secret for each namespace with the ssl certs I wanted to use. At the time, this was an entirely manual process in which I had to generate the SSL certificate, base 64 encode the fullchain.pem and privkey.pem files, then create the Kubernetes secret and specify it for each service's ingress definition. This is a lot of manual work that can be offloaded to cert-manager, which I'll go over in the "Let's get it setup!" section. For now, just know that cert-manager has some built in acme solvers, one of them being cloudflare so that's what I'll be using. For more information on creating your own solver, please refer to cert-manager's documentation as it is out of scope for this post. One more important prerequisite is to understand the two main types of acme challenges, HTTP-01 and DNS-01. For my use case, I'm using DNS-01 since I want to keep my services private while still using a publicly signed SSL certificate. Refer to https://letsencrypt.org/docs/challenge-types/#dns-01-challenge for more information.

Let's get it set up!

Real quick, we need to know what the difference between an Issuer and a ClusterIssuer is. An Issuer can create certificates within a specific Kubernetes namespace whereas a ClusterIssuer can create certificates for the entire cluster regardless of the namespace where certain resources may live. For my use case, I went with ClusterIssuer since all of my services will be of the *.sonicd007.com domain. K3s Rancher comes with cert-manager pre-installed but if you don't have it installed on your cluster, you can install it with kubectl, helm, or OperatorHub. More info on installation can be found here: https://cert-manager.io/docs/installation/

In order to retrieve certificates from LetsEncrypt, I needed to create a ClusterIssuer and in order to create this ClusterIssuer, I needed an api key from my DNS provider, Cloudflare. So the first step is to get that API key and create a secret within the cert-manager namespace.

From the dashboard, click "Get your API token"

Select "Create-Token" and create a custom token with the following permissions: Account Settings:Read, Zone:Read, DNS:Edit

API Token permissions

Once you have your token you can come back to Kubernetes land and create your cloudflare-api-key secret:

data:
  api-key: base64_encoded_api_key
kind: Secret
metadata:
  name: cloudflare-api-key
  namespace: cert-manager
type: Opaque

secret.yaml

Next it's time to create the ClusterIssuer. LetsEncrypt has a staging server you can use for testing purposes as to not trigger the rate limit on their production server while you're still setting up your environment. In this example, I'll be pointing at the staging server but for the actual production ready use case you'd want to point to the production url (https://acme-v02.api.letsencrypt.org/directory) instead of the staging url (https://acme-staging-v02.api.letsencrypt.org/directory).

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
  namespace: cert-manager
spec:
  acme:
    # The ACME server URL
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: redacted@redacted.com
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-staging
    # Enable the DNS-01 challenge provider
    solvers:
    - dns01:
        cloudflare:  # Replace with your DNS provider
          email: redacted@redacted.com
          apiTokenSecretRef:
            name: cloudflare-api-key
            key: api-key

letsencrypt-staging-clusterissuer.yaml

Once the ClusterIssuer is in place, your token is created and has the correct permissions, your secret key for your token has been created, it's now time to use it with our ingress definitions to automate and retrieve a valid SSL certificate. Are you ready for the secret annotation that triggers it all automatically?
cert-manager.io/cluster-issuer: "letsencrypt-prod"

This single annotation on your ingress will tell cert-manager to generate the SSL certificate, the secret containing the SSL certificate, and handle the renewal of your SSL certificate. A full sample ingress can be seen down below:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: registry-ingress
  namespace: registry
  annotations:
    traefik.ingress.kubernetes.io/service.serversscheme: https
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: traefik
  tls:
  - hosts:
      - registry.sonicd007.com
    secretName: ssl-cert
  rules:
    - host: registry.sonicd007.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: registry-service
                port:
                  number: 80

registry-ingress.yaml

The important parts are the cert-manage.io/cluster-issuer annotation, the tls hosts name which in this example is registry.sonicd007.com, and the secretName. Cert-manager will request the certificate using the host name defined in this ingress and the secretName is what cert-manager will use to store the certificate once it obtains it.

Caveats

Some gotchas with this setup that you should be aware of. There needs to be a way for the cluster to access the public domain of sonicd007.com to retrieve the SSL. Since I configured my pfSense router to route my LAN traffic for *.sonicd007.com domains to my cluster subnet, I had to explicitly use google's DNS within CoreDNS of my Kubernetes setup. It isn't ideal but it works for what I'm doing. For reference, these are the changes I made within the CoreDNS ConfigMap:

openproject.sonicd007.com:53 {
  forward . 8.8.8.8
}
gitea.sonicd007.com:53 {
  forward . 8.8.8.8
}
registry.sonicd007.com:53 {
  forward . 8.8.8.8
}
lexops-dev.sonicd007.com:53 {
  forward . 8.8.8.8
}
lexops-dev-api.sonicd007.com:53 {
  forward . 8.8.8.8
}
jellyfin.sonicd007.com:53 {
  forward . 8.8.8.8
}

Additionally, the automated aspect of SSL ingress setup involves a special annotation which triggers something called an ingress-shim which comes with cert-manager. Please visit https://cert-manager.io/docs/usage/ingress/ for a more in depth dive into how it all works.

Conclusion

With cert-manager, securing services within Kubernetes using Let’s Encrypt becomes an efficient, automated process. By setting up a ClusterIssuer and adding a single annotation to your Ingress resources, cert-manager takes over SSL management, from obtaining and storing certificates to handling renewals. This greatly simplifies managing SSL certificates across multiple services. The addition of DNS challenge configurations and integration with Traefik provides a flexible and powerful approach to maintain secure, private connections in a Kubernetes environment. While there are some DNS and network configuration requirements to consider, once set up, this solution offers a robust, hands-free approach to managing SSL certificates, leaving you more time to focus on other aspects of your applications.