back to journal
CI/CD pipeline
How We Made Our CI/CD Pipeline 10x Faster (From 62 Minutes to 6)
The four boring changes — parallel tests, Docker layer caching, Terraform, live metrics — that cut our CI/CD pipeline from over an hour to six minutes.
Ralph DuinSeptember 15, 20253 min read
<p>Continuous Integration and Continuous Deployment pipelines are the backbone of modern software delivery. When your pipeline lags, everything else lags with it — feedback loops, bug fixes, release cadence, team morale. We hit that wall hard. Here's how we cut ours from over an hour to six minutes.</p>
<h2>Step 1: Find the real bottleneck</h2>
<p>The first thing we did was stop guessing. We instrumented every stage and measured it. The results surprised us:</p>
<ul>
<li>Tests: 72% of total pipeline time</li>
<li>Build + image push: 19%</li>
<li>Infra provisioning: 6%</li>
<li>Everything else: 3%</li>
</ul>
<p>If you don't measure, you end up optimizing the wrong stage. We almost spent a week on Docker layer caching before realizing it would have saved us ninety seconds on an hour-long pipeline.</p>
<h2>Step 2: Parallelize the test suite</h2>
<p>Our tests ran serially on a single runner. Moving to parallel test sharding across four runners was the single biggest win — cut testing time by 68%.</p>
<pre><code class="language-yaml">jobs:
test:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
shard: [1, 2, 3, 4]
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v1
- run: bun install --frozen-lockfile
- run: bun test --shard=${{ matrix.shard }}/4</code></pre>
<p>We also killed a dozen tests that were flaky or testing framework behavior instead of our own code. Fewer tests, faster signal.</p>
<h2>Step 3: Cache the Docker build layer</h2>
<p>Our Docker builds were starting from scratch every time. Adding BuildKit layer caching to a remote registry dropped image builds from four minutes to forty seconds:</p>
<pre><code class="language-yaml">- uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ env.IMAGE }}:${{ github.sha }}
cache-from: type=registry,ref=${{ env.IMAGE }}:buildcache
cache-to: type=registry,ref=${{ env.IMAGE }}:buildcache,mode=max</code></pre>
<h2>Step 4: Provision infrastructure as code</h2>
<p>Environment provisioning used to require a human in the loop. We moved to Terraform so the pipeline could spin up a preview environment for every pull request:</p>
<pre><code class="language-hcl">resource "fly_app" "staging" {
name = "app-staging-${var.pr_number}"
org = "personal"
}
resource "fly_machine" "api" { app = fly_app.staging.name region = "ams" image = var.image }</code></pre>
<p>Every PR gets its own preview environment, destroyed automatically when the PR closes. Reviewers stop asking "can you deploy this somewhere I can click?"</p> <h2>Step 5: Monitor so it stays fast</h2> <p>Pipelines rot. We wired Prometheus to scrape pipeline metrics and built a Grafana dashboard that alerts us when any stage creeps past its budget. If tests drift back toward fifteen minutes, we know before it becomes a daily annoyance.</p> <h2>The result</h2> <p>From 62 minutes to 6 minutes. A 10x improvement, achieved through four boring changes: measure, parallelize, cache, automate. No magic. No new framework. Just ruthless attention to where the time was actually going.</p> <p>The lesson: faster pipelines aren't a one-time project. They're a habit. Measure weekly, cut ruthlessly, and don't let your CI slow to a crawl while you're not looking.</p>