If you don't trust a program, how can you run it so it can't compromise the system it runs on?
This is something I've been thinking about recently. What got me thinking along these lines is a paper djb wrote about qmail and writing secure software.
He suggests that if you can't trust code, then you can put it in a prison:
"We can architect computer systems to place most of the code into untrusted prisons. "Untrusted" means that code in these prisons--no matter what the code does, no matter how badly it behaves, no matter how many bugs it has--cannot violate the user's security requirements."
The idea is that we can build systems such that there is a small amount of trusted code that we would endeavour to make bug free. We'd imprison the rest such that even if it has bugs leading to control by an attacker, it doesn't matter:
"By reducing the amount of trusted code far enough, and reducing the bug rate far enough, one can reasonably hope to produce a system where there are no bugs in the trusted code, and therefore no security holes."
This is profound. He says this concept (among others) makes progress towards "invulnerable software systems".
How do we place code in a prison?
What is a prison?
Sandboxing is an equivalent though loaded term. Running a program in a sandbox isolates it from the system it runs on. However there are varying degrees of sandboxing. Are there sufficiently prison-like sandboxing techniques?
What sandboxes exist?
There are two common ways to sandbox today: Virtualisation and minimising privileges.
With virtualisation an attacker who compromises a program must escape the VM to compromise the host. If one assumes the VM technology is secure then virtualisation is a sufficient prison. Unfortunately we can't trust it to the degree we want. History shows this to be true.
Minimising privileges is often done through something like seccomp-bpf or pledge(). In his paper, djb suggests minimising privileges is a distraction rather than security. Unless a program has zero privileges, you're trusting it. For example, if you minimise a server's privileges such that it can only access the network, it could exfiltrate data or compromise the kernel via syscalls. This is not a prison.
Why are syscalls a concern?
Because if you have control over a program, and you have privileges to run a syscall, then you can talk to the kernel. It is very hard to trust the kernel:
"the kernel is very, very big and very, very complicated. It's an incredibly high quality piece of software, but it's just a simple fact that the bigger and more complicated something is, the more bugs it's likely to have." source
One part of being a good prison is protecting access to the kernel.
What can we do?
If running a program in a VM is theatre and we can't trust a program with syscalls then we're in trouble.
I see two solutions if we want to approach the type of prison we're after:
Run the untrusted program on an untrusted host. If we can't imprison a program, then we can't trust its host. It seems possible to run much of a software system on untrusted hosts. You'd be able to limit the amount of trusted code in a large system this way. Of course this moves the problem up a level as now we have to imprison a host.
Wrap an untrusted program in a trusted program such that no syscalls go to the host. An interesting example of this is gVisor. gVisor implements the kernel API and handles a program's syscalls instead of the kernel. If you trust gVisor to be secure, it is a prison. This is similar to trusting VMs as you are trusting a complex piece of software, but it allows us to reduce the complexity which allows us to approach trusted code. It is written in a safer language and is much smaller than the Linux kernel. This post discusses the benefits further.
The first is not satisfactory, though it is a useful technique when architecting systems.
The second seems practical. gVisor is in production use at Google. However trusting gVisor to be bug free is irksome. Can we do better?
For specific use cases it seems feasible to write something like gVisor that is smaller and more trustable. This would be akin to tailoring privilege rules for a particular program.
Take an HTTP server as an example. It seems reasonable to say that all it
needs to do is
read() a request and
write() a response. We could write
a wrapper using
ptrace() to intercept and block all other syscalls and
handle those in our wrapper rather than in the kernel. We must deal with
managing connections somewhere of course, so anything more than a static
website becomes more complex, but I think there's potential as it allows us
to build a tight prison around a piece of code. It does require
that we significant alter code to fit in this model however.