Every Infrastructure Decision I Endorse or Regret

Inspired by (Almost) Every infrastructure decision I endorse or regret, I thought it would be interesting to do the same for my startup.

AWS

Picking AWS

🟩 Endorse

There wasn’t really any discussion about this as it’s the default I’ve stuck with for the last decade. I’m sure there are things that GCP does better, but for an early-stage startup, I’d rather avoid anything exciting in this part of the stack. My only advice is to fully embrace as much of AWS as part of your stack, otherwise you will be spending a lot of time retrofitting and worrying about problems that don’t really exist (like having the optionality to change providers later).

ECS + Fargate

🟧 Weak Endorse

Early on, we decided we didn’t want to manage servers. We don’t want the complexity of k8, and deploys should be immutable (as much as possible). The deploy pipeline is understandable (build image, upload, replace Fargate tasks). While it took some finagling to get it in place, we basically haven’t touched it since.

RDS

🟩 Endorse

We’re not at any kind of scale where the costs would matter compared to managing a Postgres server directly. At a previous startup, I had to build all of the tooling for backups and replication, and that was a poor use of time.

KMS

🟩 Endorse

I wish I would have given KMS a try sooner! It’s tightly integrated with IAM and makes secrets much easier to manage. It seems daunting at first, but once you have a runbook, it’s easy-ish to use.

Terraform

🟩 Endorse

Using terraform from day 1 was a great idea. Configuration changes are checked into version control, which makes it transparent to everyone. Static analysis has made security audits easier, and automatic checks help us stay on top of security issues. It’s more likely that other engineers will make infra changes because the tooling is centralized and copy/paste-able.

Software and SaaS

Docker

🟧 Weak Regret

I’m not naive enough to believe there is an actual alternative to Docker, but the amount of grief Docker causes in local development on MacOS is very high. For deployment artifacts, I think it’s fine, but this is a clear case of Worse is Better and I accept that.

Postgres

🟩 Strong Endorse

Postgres is our primary DB and our early decision to centralize on this has paid off. At previous startups I stitched together multiple databases for different purposes thinking I was using the best tool for the job. This time around, we haven’t fallen into this well-known trap.

GitHub Actions

🟩 Endorse

It works out of the box, and it’s easy for engineers to fiddle with and extend. I just wish configuring beefier test runners was more comprehensible.

Sentry

🟩 Endorse

I’ve used Sentry at several companies now, and it’s pretty good as an early-stage error tracking, perf regression detector, and workflow for tracking runtime bugs. It takes some effort to dial in (associating errors to users, long stack traces), and there are some quirks (sampling, sometimes partial data from structured logs??).

Slack

🟩 Endorse

Starting with Slack alerts for most things is the easiest way to make sure you find issues before your customers tell you about them (if they tell you at all). We have several Slack bots to help with process and alerting. You can get pretty far with this setup without having a dedicated alerting/monitoring tool.

Notion

🟩 Endorse

I’m a huge proponent of writing things down, and for engineering in particular, documentation is the first step to automation. Most of our work is product engineering at this point, so having a culture of writing briefs and communicating asynchronously makes tools like Notion essential. It used to be slow, but thankfully, it’s getting better for business users.

Netlify

🟥 Regret

Almost all of my personal websites are static sites these days, and I mostly used Netlify to avoid setting up CloudFront (good lord, it’s complicated). Unfortunately, this has become a big thorn in our side because at this point, for security reasons, we use it as a glorified FTP. It’s the only part of our infra that is not in AWS, and only a few engineers have access to it. In hindsight, I probably would have put this in Terraform and CloudFront from the beginning.

1Password

🟩 Strong Endorse

It’s by far the best password manager I’ve used. Making sure all credentials are stored in 1Password from the beginning saved us a world of grief. There are decent admin tools for teams too.

aws-vault

🟩 Endorse

We use IAM for authentication anytime we are accessing infrastructure. The aws-vault CLI makes this very easy to set up. I just wish it integrated with touch ID or something else to make it less clunky.

Vanta

🟧 Endorse-ish

I haven’t used other providers for SOC2 to compare Vanta to, but the experience can be difficult to know what you need to do. The automated tests for infra integrated with AWS are good, as are the offboarding workflows.