Writing an ircd

October 16, 2016

I recently began writing an ircd (an IRC server). It's now usable and I have started introducing it into a small IRC network I run. Building it was a fun challenge and I learned a lot, both about Go and about IRC. I also ended up with a piece of software that is useful for me.

I thought it would be interesting to write about this project. I will describe my approach to the project and how I progressed, and then talk a bit about its internals, what I learned, its features, and what I'm planning next for it.

Why I wanted to build a new ircd

I've thought several times about writing an ircd. I have even begun writing one in various languages, but never reached a point where it was usable. Usually I start writing one because I like to work with IRC and servers, but end up not being able to dedicate enough time to it.

My most recent decision to write an ircd started out as a challenge to myself, and with the idea that I could use it to practice Go. The most challenging part was whether I could write a new ircd that would be compatible with the server protocol used by ircd-ratbox. ircd-ratbox is the ircd I use on my network, and I had no knowledge of its server protocol. Writing an ircd from scratch was challenging by itself, but being compatible with another ircd seemed much more insurmountable, partly because I assumed (wrongly) that the server protocol would have little documentation.

I had also recently written a library for parsing and encoding IRC protocol messages (as part of writing a client library), so working with the protocol's message format was (mostly) solved, and I was able to jump right in.

How I progressed

I began by ignoring the problem of how to link up with other IRC servers (the server protocol). I approached it as if the daemon would be the only server in a network. This let me get the client protocol working nicely. I quickly reached a minimally useful implementation where clients could chat and use basic IRC features. I didn't have things like flood control or channel operators, but my small network doesn't need those, so I was happy.

I started looking at how I could introduce linking to other servers. It is important for an IRC server to support this for redundancy reasons. If multiple servers link together to form a network, they share the same view of users and channels, and if one server goes down, the users connected to it can reconnect to a different server in the network and keep chatting. My small network currently has four servers, so for my ircd to be useful being able to form a network was crucial.

I initially thought I should base my server protocol on that described in the IRC RFCs and get linking to other instances of my ircd working first, and then solve linking with other ircds afterwards. After thinking about this, I decided I would end up implementing either two versions of a server protocol, or at least heavily rewriting one, so I decided to go directly to linking to other ircds.

I wanted to be able to link with ircd-ratbox, so I went looking for information about how it links up with servers. I found the server protocol it uses is called TS6 and its source code includes documentation about it. Here's a slightly updated version of its TS6 documentation. This was very useful and I was able to get linking working almost without any further reference.

I started out not following the protocol in some places for simplicity. For example, at first I didn't implement much of the collision logic (such as what happens when two users end up with the same nickname on two different servers). Instead I did split the network whenever any consistency problem arose. After I had the servers linking to each other and supporting basic features like channels and messages again, I went back and implemented dealing with inconsistencies.

Now my ircd works well for what my network needs, but it's by no means usable for larger networks. I've only started supporting channel modes and channel operators. It is a base to build on and I find it easy to add new features and commands.

Internals and architecture

I'll talk about the internals of the ircd and why I made some of the decisions I did.

Main loop

The main loop of the daemon looks like this:

Receive a message on a channel for events
Figure out what the event is
Process the event

For example, if a client sends a message:

Receive the event message
Discover it is a client message
Figure out what client it's from
Figure out the message's command, e.g. PRIVMSG
Call a client's command function, e.g. privmsgCommand()

There are other several types of events:

A client connecting
A client exiting
The server being told to reload its configuration
The server being told to restart
Wake up and do bookkeeping
Shut down

This is a single central function we can look at to understand mostly everything the daemon needs to do.

Client structure

I needed to manage unregistered clients (client connections that have not successfully joined as users with nicknames), users, and servers.

In my first version of the daemon where there were no servers, every connection was a user client, and the client was either registered as a user with a nickname or not.

In my introduction of linking with servers, instead of having only a Client, I designed my types differently:

Every connection starts out as a LocalClient type. This is a networked connection but we don't yet know what it is. All users and servers share some properties: They have a network connection, they have a connection start time, they have a unique identifier, and they have two goroutines for interacting with their socket (for reading and writing).
When a user finishes registration, I upgrade them to a LocalUser type. This type embeds the original LocalClient type, and adds things like the last time the user said something, the last time we pinged it, and a whole new set of functions.
Likewise when a server finishes registration, I upgrade them to a LocalServer type which again embeds the LocalClient.

This lets me easily separate how commands should be handled from users versus how they should be handled from servers. For example, both users and servers can send PRIVMSG commands, but we need to react to them differently. Both LocalUser and LocalServer types have privmsgCommand() functions, and we differentiate them by their type. I don't need to check "is this a user, or is this a server?". Instead, I tie each function to a server or a user type and know without any checks.

Another key decision was how to keep track of users and servers. A user/server may be directly connected to us (local) or connected to another server (remote). For users, I represent every user in the network with an instance of the User type. If they are local to us, then they also have an instance of the LocalUser type. Fields that are common to a user whether they are remote or local, such as nickname, are part of the User struct, whereas if a field is only relevant for a local user, then the field is part of the LocalUser struct. The User type has a LocalUser field which is non-nil only if the user is local. For servers the situation is identical except the types are Server and LocalServer.

This lets me keep the different sources of users and servers organized. I believe ircd-ratbox does something similar and that is probably where I originally got the idea, but I'm not certain.

Commands and servers

I thought it would be nice to describe the general pattern of what happens when a message comes into the ircd. The same process happens if the source is a user or another server.

We receive a message with an IRC command, such as a PRIVMSG command.
We take action locally, such as a change of state. For PRIVMSG this might be as simple as recording the user just said something (so we know their idle time reset).
We inform our local users. For the PRIVMSG command to an IRC channel, this means sending all local users in the channel a PRIVMSG command to let them know about the new message.
We inform our local servers. For the PRIVMSG command, we want to tell each server we're connected to about the message (if they have local users in the channel, or are connected to servers that have local users in the channel). Some messages, like NICK messages, every server must know about no matter what.
We need to reply to the client that issued the command. In most cases we don't reply to servers, but for users we typically do.

The exact behaviour varies depending on the command of course, but this is the general flow of what happens when a command comes in.

How to prevent client flooding abuse

While on my small network abuse is not a problem, I knew that being able to prevent flooding is a key feature of an ircd. However, I only tackled solving it recently. I was hesitant as I expected it would be tricky to do well.

Both IRC RFCs describe a similar algorithm for flood control. It is based on having a message timer for each client. I thought it would be possible to use, but I thought it was not very clear. I decided to look at how others solved it.

I read ircd-ratbox's flood control algorithm and thought it was quite elegant, so I decided to base my ircd's flood control on it. This is how the flood control in my daemon works:

Each client has a message limit counter. It starts at 10.
Every message processed from a client decrements this counter to a minimum of 0.
If the counter is zero, the message gets queued rather than processed.
Each second (approximately), all clients have their message limit increased by 1 to a maximum of 10.
If there are too many queued messages, the client gets disconnected.

This means that a client can send 10 messages as fast as they like. But then any further message will have to wait 1 second. For example, if a client sends 100 messages in a single burst, then the first 10 appear instantly, but the remaining 90 will go out one at a time, once per second.

I don't apply this message limit to messages from servers, and I also added a way to exclude users (based on user and host mask) from the flood control.

Goroutines

There are several goroutines in the daemon. While each goroutine is short and simple on its own, we have to think about how orchestrate ending them when the server needs to shut down. I'll describe each and how each ends.

Main daemon goroutine

This goroutine is the centre of all actions in the ircd. All goroutines report to it, and it is responsible for the primary state changes that happen. Generally we don't need to worry about explicitly synchronizing access to our state because we should only be making changes in this goroutine. For example, if a message comes from a client, this goroutine will hear about it through a channel, and then run a function for the command, such as sending a message to other clients.

This goroutine ends if we're initiating shutdown (e.g., through an IRC operator using the DIE command). In that case, it cleans up and then waits until all other goroutines have ended through use of a sync.WaitGroup.

Alarm goroutine

This goroutine's job is to wake up the main daemon goroutine (which may otherwise be blocked waiting on its event channel). The daemon needs to be woken periodically to do things like check if any clients are dead, and update flood control message limits.

I thought that the time package After() function would suffice to replace this, but I realized that each time through the daemon's main loop (where I call select each time) the After() timer would restart. This is not what I want. The daemon needs to hear a wake up event every second rather than only if there has not been a message in a second or more which is what time.After() would give me.

This goroutine ends if it sees the shutdown channel is closed. I'll describe this channel more in the section about things I learned.

Signal goroutine

This goroutine's job is to listen for signals. When a SIGHUP occurs, it tells the main daemon goroutine to reload its configuration. When a SIGUSR1 occurs, it tells the main daemon goroutine to restart.

This goroutine ends the same way as the alarm goroutine.

Listener goroutines

These goroutines (there currently may be a maximum of two and a minimum of one) accept connections on a socket net.Listener and set them up as a new client.

These goroutines end either in the same way the alarm goroutine does, or when their net.Listener is closed.

Client reader goroutines

Each client connection has its own goroutine which reads from its socket, parses the IRC protocol message, and then sends it over a channel to the main daemon goroutine.

These goroutines end either in the same way as the alarm goroutine, or when their socket is closed.

Client writer goroutines

Each client connection has its own goroutine which reads from a channel unique to the client (which the main daemon goroutine writes to), encodes the message as an IRC protocol message, and then sends it to the client's socket.

These goroutines end either in the same way as the alarm goroutine, or when their channel is closed.

Ephemeral new client goroutines

These goroutines get created by the listener goroutines to deal with the setup of a new client, and then exit. This is so the listeners do not block.

Ephemeral server connection goroutines

If the local server connects outwards to another server, we create one of these goroutines to connect and set it up as a client. This is so the main daemon goroutine does not block waiting on the connection.

Channels

I use a few Go channels to communicate between the ircd's goroutines. I found it important to track these carefully to be sure that I do not accidentally end up panicking by sending on a closed channel or have a goroutine leak from sending to/receiving from a channel that will never be read from/sent to.

Shutdown channel: This channel never receives messages. The daemon opens it when it starts, and closes it during shutdown. Any goroutine can test if the daemon is shutting down by trying to receive on this channel as receive will only succeed when the channel is closed. I talk more about this pattern in the section about things I learned.
Event channel: The main daemon goroutine receives messages on this channel. If any other goroutine needs to inform the main daemon about anything, it sends a message to this channel.
Signal channel: This channel is to receive OS signals such as SIGUSR1.
Client write channel: Each client has its own write channel. This is the only buffered channel I use. It is buffered because I don't want the main daemon to block sending to a client. When it needs to write to the client's socket, it write a message to this channel, and the client's writer goroutine will receive the message and then write to the socket.

Things I learned

I learned a lot building this ircd. I'll discuss some things I learned that might be non-obvious.

A pattern for reading from/writing to sockets in Go

While I have worked with sockets in Go before, I had not written a daemon where I had to manage many client sockets. I started out looking for a way to do something like select() in C, but found Go does not support this.

I discovered a pattern to emulate this in Go:

Each socket has its own reader goroutine that reads from the socket and sends the result to a channel.
Each socket has its own writer goroutine that reads from a channel and writes the result to the socket.
A main loop reads from the channel the reader goroutines send to, and takes action. If it needs to write to a socket, it sends a message to the socket's writer channel.

This pattern is obvious in hindsight, and feels natural in Go.

Difficulties of orchestrating a shutdown in Go

It is easy to start up many goroutines doing interesting things in Go. When it comes to having an orderly shutdown though, if you have long lived goroutines, then you have to think about how to ensure all goroutines clean up and end before ending your process.

A key tool for this is sync.WaitGroup. Using this, the main goroutine can wait until all other goroutines finish up, then itself end. However, a sync.WaitGroup does not provide a way to inform goroutines they need to end.

One of the most useful ways I found to inform goroutines was to have a channel that exists only for the purposes of checking whether to shut down. When we want to tell other goroutines to shut down, then we close the channel. This is a pattern I learned from the book The Go Programming Language.

The pattern works because if we never send any message to the channel, then receiving on the channel will never succeed unless the channel is closed. This means that any goroutine can use a non-blocking select to know whether to shut down. It is always safe to do this. Here is an example of the pattern:

Main goroutine:

shutdownChan := make(chan struct{})
for {
    Do useful things

    If we need to shut down {
         close(shutdownChan)
         return
    }
}

Long lived helper goroutine:

for {
    select {
       case <-shutdownChan:
          fmt.Println("Shutting down!")
          return
       default:
          fmt.Println("Continuing my work!")
    }
}

How IRC servers link and communicate

I've done plenty of work with IRC clients, but this was my first time deeply getting into the perspective of how a server needs to deal with both clients and servers. It was a bit mind bending at first.

An example is how servers deal with users changing their nicknames. Suppose you have a user on server A that sends a message to a user on server B. At the same moment he sends his message, the recipient changes her nickname. What happens? If we're unlucky, server B will not be able to deliver the message because the nickname no longer exists.

The TS6 protocol solves this by assigning a unique identifier to each user, and at the server protocol level does not see users as nicknames. Instead it sends and receives messages using the immutable unique identifiers.

You also need to think about more than the single client that sent you a message and what the message changed. You have to think about what other servers need to know about the message, if any.

A specific detail I learned was how servers handshake, synchronise their state, and link together. I knew some of the terminology from running servers, but not the specifics. It is actually simpler than I expected. Both servers basically blast out everything they know to the other side, and expect the same in return.

IRC standards, or lack of

I started out this project ignorant about the state of the IRC standards. I initially tried to follow the latest IRC RFC, 2812, but large parts of it conflict with the IRC I know. It describes some weird things, such as channel creators in its companion RFC 2811. It seems that many client and server authors ignore large parts of this RFC and its companions too, or ignore it all together as some kind of fever dream.

The original IRC RFC 1459 is a lot more like the IRC I am familiar with, which is EFnet's version of IRC. Though this RFC does not include a lot of how current IRC servers behave, such as showing when a user signed on or when a channel topic was set, it has the basics of the protocol I recognize.

There are a few efforts underway to improve the IRC standard landscape:

There is IRCv3 which is about enhancing IRC.
There is Modern IRC which is about describing IRC as it is today (rather than drafting additions/extensions like IRCv3).
There is ircv3-harmony which is about writing a new RFC to replace 1459/2812 and resolve the inconsistencies and gaps (again without including additions described separately in IRCv3 extensions).

I like the idea of the last two the most. It seems important to document and standardise IRC as it is. Adding new modern capabilities might be nice too, but I am leery of ruining the magic of IRC. I suppose there is little argument that IRC is being left behind in many ways, so maybe these efforts can reverse that.

Splitting a Git repository

Originally I started writing the ircd in a subdirectory of a repository with my IRC client library, clients, and protocol message library. As the project expanded I thought it would make more sense to have it in its own repository.

Ideally I wanted to retain the history of my progress. I found that it was possible to extract a single directory and its history with Git. This is a post I used for reference.

One aspect I didn't figure out was how to erase the history of the directory from the original repository. I'm not entirely sure I want to do that though as it may be useful to have around.

Notable features

A few things make my ircd interesting already:

In WHOIS I report the TLS connection information. For example if I WHOIS myself right now I see: horgh is using a secure connection (TLS 1.2) (TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256)
WHOIS command responses always come from the server the target of the WHOIS is on. This means we always see things only the local servers knows about, such as idle time. On other servers you have to explicitly ask for your WHOIS command to go to the local server to get this information.
It is easy to add support new commands. I'm sure part of this is because I wrote the entire thing, but in order to support a new client command, you only need to add a new check for it in handleMessage() in local_user.go and write a single corresponding function. I think the simplicity of Go helps here as well, as reading and understanding how existing commands work should be quite easy.
Operators can see when clients connect (even to remote servers). To see remote client connections in ircd-ratbox I ended up writing a custom module. In my ircd, I send a notice to operators inside the UID command function.
When a user becomes operator on any server, existing operators will receive a notice.

These are all fairly minor, but they are things I've thought would be neat to have yet never implemented.

Future plans

I'm not finished with this project. I'm planning to improve it and add more features.

One problem I've not solved is how to automatically test much of it. I have a few unit tests but only for functions without side effects. Many of the functions have as their main job sending messages to different channels, so unit testing them is not trivial. I am thinking of refactoring these functions to instead return the messages along with their intended recipients rather than actually sending the messages. I think this this would let me easily unit test these functions as there would no longer be side effects.
I want to support more of the basic features of IRC. Specifically, I want to support all of the RFC channel modes, including channel bans.
I want to add more fun custom commands. One I'm planning is NSA which will report the TLS connection information of all users in a channel or on the network.

The One and the Many