Terraform Kept Killing My Home Internet


Why Terraform Runs Froze Everything

/images/2025/12-21/header.jpeg

Yoro Park, April 2025

A quick personal update. Our baby has been born and doing well, but isn't home yet. I'm taking paternity leave in a few days. It's going to be a long break, several months, maybe longer. When I told my manager, he accepted it immediately. No hesitation. No drama.

Still, I can't shake the guilt. Maybe it's just me, but I'm anxious about stepping away from my career for a while.

When I told my coworkers, I sometimes caught a look of surprise or even a touch of sadness in their expressions, which only made the guilt worse. That's why, in 2026, I want to go all-in on parenting.

Until recently, I felt like I was running out of runway. Somehow, I pulled it off.

For the past few months, I've been deep in Terraform-based infrastructure work. Before I fully step away for a while, I wanted to write down one oddly memorable problem from that stretch.

Terraform didn't just slow down my work. It kept killing my home internet.

Slow or Something Worse

While provisioning AWS resources with Terraform, I started seeing weird behavior. As the configuration got larger, so did the apply duration. Five minutes, then ten, then twenty.

Worse, when a run failed because of missing IAM permissions after all that waiting, everything was wasted. Terraform would also sometimes leave the state locked, creating a half-applied state that needed manual cleanup.

If you've ever stared at a long-running terraform apply, you know the feeling.

During long Terraform runs, my entire network would become unusable. Slack and Notion froze, even Google Calendar timed out. Terraform was painfully slow, and it took my internet down with it.

I couldn't find much about it. Some suggested lowering the -parallelism option (default is 10) to around 4. I tried that, and it helped a bit, but not much.

It wasn't really Terraform's fault. My home router was the culprit.

What Was Actually Happening

At the time, I suspected my router was running out of NAT state. But looking back, I think the more likely scenario was simpler. I was asking too much of the router. Too many concurrent connections, too much churn, and the router's CPU couldn't keep up.

I reached that conclusion after noticing that running Terraform through a proxy server suddenly fixed everything. (More on what that proxy does in a bit.)

The proxy didn't make AWS faster. It made my router's life easier by smoothing outbound bursts and keeping it from getting overwhelmed by lots of short-lived connections.

What used to time out after 20 minutes now finishes in ~30 seconds or less. (Many Terraform providers (via the Terraform Plugin SDK) default resource CRUD operations to a 20-minute timeout.)

Another clue came when I ran this while Terraform was running:

netstat -anp tcp | grep -E 'TIME_WAIT' | wc -l

It showed over 200, sometimes 400 connections in TIME_WAIT. That was a sign I was burning through lots of connections. I didn't have CPU/NAT metrics from the router, so I can't pin the outages on the router, but the TIME_WAIT count was the first concrete hint that the failure mode was "too many new connections, too quickly." That kind of churn can stress stateful systems, either on the client or along the path (ephemeral ports, NAT tables, whatever).

If the goal was to smooth out bursts of outbound connections, a forward proxy was the simplest thing to try. My first impulse was to just control concurrency, but instead of fighting every tool's parallelism settings, I wanted a single place to smooth out the bursts.

So I wrote a forward proxy as a quick experiment.

Why My Router Might Have Choked

Terraform is written in Go, and its HTTP client tends to hold HTTPS connections open for a while. When you run Terraform, it opens dozens of connections at once.

That alone is fine. But when you end up with hundreds of concurrent connections, your OS and your router both have to keep state and handle bursts across all of them.

On many consumer routers, that state lives in the NAT table, and maintaining it isn't free. If the table fills up or the CPU pegs, new connections start failing, and suddenly the internet "dies". Requests stall, timeouts spike, everything crawls.

Usage

The proxy is published to this repository.

Start it like this:

cargo install fyntr
fyntr

It listens on port 9999 by default.

export HTTPS_PROXY=http://127.0.0.1:9999
terraform apply

Now all Terraform traffic goes through the proxy. Hope this helps if you're seeing similar connection churn. See you in the new year.


Thanks to my coworker, Jerry Feng, for reviewing this article.


See also