Running an IRC network in Kubernetes

July 7, 2019

For a long time I've run an IRC network the old fashioned way. I would compile a daemon, scp over configs, and run directly on a server. I used bare metal servers (mostly Raspberry Pi's!).

I've been interested in Kubernetes for some time, but until now I hadn't used it for a project. Being me, I decided to see what running an IRC network on it would look like.

I ended up liking what I built so much that I've switched my entire network to it. It's now running inside a cluster on Google Kubernetes Engine (GKE).

Kubernetes has a reputation for having a high complexity overhead. To some degree that's warranted, but in some ways it's actually very simple, at least if you don't take on running Kubernetes itself!

The simplicity comes from going from thinking about (and maintaining) individual servers to instead focussing on your applications. You write configuration about higher level concepts such as pods and services, and send the configs to Kubernetes. Kubernetes takes care of the rest. When you wrap your head around these ideas it's clear why Kubernetes is a big deal.

There is certainly a trade-off as there are elements of loss of control and lock-in, not to mention the hidden complexity of running Kubernetes. Some of these are mitigated by Kubernetes being open source, so you can always move to a different provider or run it yourself.

What I found most exciting things about using Kubernetes this way is the freedom from worrying about infrastructure. In fact, I turned off several of the servers I was using. They're replaced by a few configs. Wow!

In the rest of this post I'll walk you through how I set up my IRC network in a Kubernetes cluster. The IRC daemon I use is one I wrote called catbox. You can see most of the configs I describe here.

Prerequisites

There are several things to configure beforehand. These are the two main docs I followed:

These get you gcloud set up to work with GKE and docker set up to push to GCP's container registry.

Create a cluster

Creating a cluster is a single command:

$ gcloud container clusters create my-cluster \
    --machine-type=g1-small \
    --num-nodes=2

This creates a 2 node cluster. Both nodes are g1-small VMs.

catbox requires minimal resources, so I use the smallest machine I can. I found that f1-micro wasn't able to run core cluster services such as DNS due to memory exhaustion, so this is one level up.

Then allow kubectl to work with the cluster:

$ gcloud container clusters get-credentials my-cluster

Now we have a Kubernetes cluster that we can interact with using kubectl. Nothing is running in it yet. Let's change that!

Create a container image

Kubernetes runs pods. Pods have one or more containers. In order to run our daemon, we need a container image for it.

This is the Dockerfile I use:

FROM golang:1.12 AS build
RUN GO111MODULE=on CGO_ENABLED=0 go get github.com/horgh/catbox@master

FROM alpine:3.10
COPY --from=build /go/bin/catbox /

Build and push it to the registry:

$ docker build --no-cache \
    -t gcr.io/elite-vault-142205/catbox:1.12.0-2019-07-06-001 .
$ docker push gcr.io/elite-vault-142205/catbox:1.12.0-2019-07-06-001

We still aren't running anything in our cluster, but now we have something we can run!

Run the IRC network

A StatefulSet is a Kubernetes resource describing an application to run that gives a unique and stable identity to its pods.

While it's not how I'd ideally run the daemons, I decided a StatefulSet was the place to start if I wanted to avoid altering the daemon itself. Primarily this is because the daemon has a config file that tells it what other daemons to link with.

This is the core of the StatefulSet I use:

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: catbox
spec:
  selector:
    matchLabels:
      app: catbox
  serviceName: catbox
  replicas: 2
  template:
    metadata:
      labels:
        app: catbox
    spec:
      containers:
        - name: catbox
          image: gcr.io/elite-vault-142205/catbox:1.12.0-2019-07-06-001
          command:
            - ./catbox
            - -conf
            - /etc/catbox/$(POD_NAME).conf
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          ports:
            - containerPort: 6667
            - containerPort: 7000
          volumeMounts:
            - name: config-volume
              mountPath: /etc/catbox
      volumes:
        - name: config-volume
          secret:
            secretName: irc-config

This tells Kubernetes to run 2 copies of the containers described by the spec.

The daemon configs get mounted into the volume at /etc/catbox, which come from a Kubernetes Secret:

$ kubectl create secret generic irc-config \
    --from-file=configs/catbox-0.conf \
    --from-file=configs/catbox-1.conf \
    --from-file=configs/certificate.pem \
    --from-file=configs/key.pem \
    --from-file=configs/opers.conf \
    --from-file=configs/servers.conf \
    --from-file=configs/users.conf

There's not much interesting in these configs from a Kubernetes perspective. The key parts are:

2 servers in servers.conf using hostnames catbox-0 and catbox-1
Listen on port 6667 for plaintext connections and 7000 for TLS connections

Applying the StatefulSet config starts up 2 catbox pods:

$ kubectl apply -f resources.yml
$ kubectl get pods
NAME       READY   STATUS    RESTARTS   AGE
catbox-0   1/1     Running   0          3h30m
catbox-1   1/1     Running   0          3h31m

Now we have 2 IRC daemons running. However, they can't talk to each other to link up, nor can they accept clients.

Adding a Service resource for each allows them to communicate:

---
apiVersion: v1
kind: Service
metadata:
  name: catbox-0
spec:
  ports:
    - port: 6667
  selector:
    statefulset.kubernetes.io/pod-name: catbox-0
---
apiVersion: v1
kind: Service
metadata:
  name: catbox-1
spec:
  ports:
    - port: 6667
  selector:
    statefulset.kubernetes.io/pod-name: catbox-1

If we apply this config, they link up.

To allow clients to connect from outside the cluster, create a Service with type LoadBalancer:

---
apiVersion: v1
kind: Service
metadata:
  name: catbox
spec:
  ports:
    - port: 7000
  selector:
    app: catbox
  type: LoadBalancer
  loadBalancerIP: 34.83.184.224
  externalTrafficPolicy: Local

The IP is a static one created like so:

$ gcloud compute addresses create irc-ip --region us-west1

And that's it! You can connect on port 7000 and reach one of these IRC servers. I choose not to expose 6667 externally.

Niceties

Some other improvements I made along the way were:

Make sure the daemons are scheduled to different nodes. I have two nodes so that if one dies, the whole IRC network won't vanish. To do this I use affinity as part of the spec:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
            - key: app
              operator: In
              values:
                - catbox
        topologyKey: kubernetes.io/hostname

During updates I wanted each new pod's IRC server to link with existing ones before Kubernetes replaces the next pod, so as to maintain network state. To do this I delay the container's readiness with a readinessProbe:

readinessProbe:
  exec:
    command:
      - /bin/true
  initialDelaySeconds: 10

Performing updates

Periodically I'll need to update the servers or their configs. What does that look like in this setup?

If it's a config update I can update the Secrets and then signal the daemons to reload:

$ kubectl delete secret irc-config
$ kubectl create secret [..]
$ kubectl exec catbox-0 -- pkill -HUP catbox
$ kubectl exec catbox-1 -- pkill -HUP catbox

Not the prettiest solution, but it works. Instead of a signal I can run a command as an IRC operator. I could also build automatic reloading into the daemon, such as via inotify.

Previously I would scp configs to a host and run the signal via ssh.

If it's a software update, I build and push a new image, change the image in the StatefulSet spec, and re-apply:

$ kubectl apply -f resources.yml

Kubernetes automatically stops and starts new pods to roll out the update.

Previously I would build the daemon, rsync it to a host, and trigger a restart via a different signal.

Future improvements

I'm sure there are ways I can improve this setup. I went through several iterations to get to this point already.

Some ideas:

Use gVisor for the container runtime.
Look at what other security settings might make sense.
See whether I could switch from a StatefulSet to a Deployment. I wouldn't have to reference individual pods and it would make scaling up and down trivial. I think I'll need to add an auto discovery mechanism to catbox.
I rely on manually running gcloud. I suspect this won't be managable over the long term, especially if I add to the cluster or want to rebuild it. I think the usual solution for this is something like Terraform.

The One and the Many