Serving authenticated downloads from Google Cloud Storage using Cloudflare Workers
At $work we wanted to improve how we serve files for download. Previously we served files from disk via a daemon on our servers. We recently switched to serving them from Google Cloud Storage fronted by Cloudflare Workers. The new system has benefits over the old and I'm pleased with the result. I'll explain why we did this and how it works.
Why change anything?
While the old system worked, it had drawbacks.
One issue was that to make the files available to the servers hosting them, we had to copy them to each server. In practice this meant a single point of failure and access between the servers. As well, spinning up a new server required syncing these files to it.
Another problem was that our servers occasionally received a lot of download traffic all at once, starving other services. While we mitigated this with things like traffic shaping, the fact remained that services with sensitive response times could be impacted by downloads. This is especially a concern when our systems might be experiencing higher than usual traffic for other reasons.
Using GCS solves the first issue because while it may be considered a single point of failure, it's designed for serving files this way, so we're confident it's up to the task. It also means our servers don't need access to each other.
GCS avoids the bandwidth problem since our servers are no longer using their bandwidth for the files!
How does it work?
We have a Cloudflare Worker on the URLs serving the files. When a download request comes in, the Worker makes a request to our servers asking whether to allow it. (To download you need to authenticate). If our servers allow the download, they respond with a Signed URL to the file. The Worker fetches and streams that to the client.
We use the Worker to avoid HTTP redirects. If we sent clients to our servers and redirected them to Signed URLs, we'd break integrations not expecting redirects. (We don't control the clients).
With the Worker, from the client's perspective nothing changes even though behind the scenes multiple requests occur. The client never sees a GCS URL despite that being where the data comes from.
Summing up
We transparently converted an API serving downloads from local files to instead serving them out of GCS. The new system allows us to reduce access between our servers, makes our services less vulnerable to traffic spikes, and handles higher levels of traffic. Additionally it offloads much of this service to Google and Cloudflare, reducing what we're responsible for operating. It was a fun project which worked well. I was particularly impressed with Cloudflare Workers. They're a powerful tool.