Upgrading a Stripe webhook version
At $work, we use Stripe for payment processing. Part of this is listening for webhook events from Stripe. To handle these events, we have a web service daemon. Stripe sends HTTP requests to it as things happen. This works well today, but things change over time and the webhook integration needs maintenance both by us and by Stripe.
On our side, we have a rolling deployment such that we can change the code with effectively zero downtime. We take down and replace instances with new code while others continue running.
On Stripe's side, there is the concept of an API version, which looks like
2024-09-30.acacia. Behaviour across different API versions can differ,
but in theory is stable for a particular version. When you create a webhook
you choose a version and it stays on that version.
Having a stable integration is convenient, but we don't want to stay on one version indefinitely. That has risks such as they may eventually stop supporting the version we're on. Even if they don't, we may eventually want a feature in a newer version, and the upgrade path could be painful if we were far behind. As such, we'd like to routinely upgrade to newer versions as they come out.
The upgrade process
What is the process to upgrade to a new version? Stripe has a guide describing an example process. Essentially, to opt in to a new version, we have to define a new webhook in Stripe and switch over to it.
The process I chose to implement is similar to what they describe, but with slight modifications. I also automated the process so that we don't need to make any changes manually. This is how it works:
Prerequisites
- We configure our Stripe webhooks to include a
versionquery parameter. - The web service daemon rejects webhook requests if they aren't for its version (by sending an HTTP 400 response). Its version is a constant in its code.
- Stripe continues retrying event delivery. This means we won't lose an event.
- Stripe provides a unique ID for each event, even if they deliver it to multiple webhooks.
- We won't process an event more than once because we use a Postgres upsert and only process it if we inserted the event.
Phase one
- When we begin deploying, we query Stripe's API for the current webhook's version.
- If the current webhook's version doesn't match our code's version, we
create a new webhook using the API.
- This webhook is identical except the version is changed (both the API version and the query parameter).
- Events start coming in with the new version, but our current daemons reject them because they are on the old version.
During deployment
- The deploy proceeds and new instances of the web service daemon begin
listening.
- At this point, there may be multiple instances, some handling the old version and some the new.
- Eventually we replace all instances with the new version.
- At this point, all instances are rejecting the old version's events, but Stripe continues retrying them.
Phase two
- We disable the old webhook using the API.
- This stops Stripe from delivering events to the old webhook.
This gets us on to the new version of the webhook. Since the code version comes from the Stripe Go package, it gets bumped when we upgrade the package, and the next deploy runs the webhook upgrade.
Conclusion
None of this accounts for changes you may have to make to handle actual behaviour changes in the new version, but this process provides a way to switch over.