back to journal
AI costs
The Brevity Rule: Cut AI Costs ~10% With One Config Change
Six lines of instructions, zero information loss, and nearly 10% off my AI bill. The exact rule, the numbers behind it, and why verbose responses bleed budget twice.
Ralph DuinMarch 20, 20263 min read
<p>My AI bill was climbing every month. Not because I was using more AI — because every response was longer than it needed to be. One six-line config change cut my spend by nearly 10% without losing a single piece of information. Here's exactly what I did.</p>
<h2>The rule</h2>
<p>I dropped this into my project's system instructions:</p>
<pre><code class="language-markdown">1. Default to minimum viable response. Bullets over paragraphs.
Tables over lists when there are 3+ items.
2. Never drop information — compress, don't omit.
3. If the user asks "why" or "explain", expand only that part.
Stay terse everywhere else.
4. No filler phrases: "Let me", "Here's what happened", "Great question".
5. Never restate the user's question.
6. For code references: path + line range only, no re-quoting.</code></pre>
<p>Six lines. No other changes. Let it run for three weeks and measured the delta.</p>
<h2>What the numbers looked like</h2>
<p>Baseline sample from my Cursor usage export:</p>
<ul>
<li><strong>Requests:</strong> 5,479</li>
<li><strong>Total tokens:</strong> 5.05 billion</li>
<li><strong>Avg output tokens/request:</strong> 3,949</li>
<li><strong>Avg cost/request:</strong> $0.48</li>
</ul>
<p>After applying the brevity rule:</p>
<ul>
<li><strong>Output tokens saved:</strong> 10.8 million</li>
<li><strong>Compounded input savings:</strong> ~54 million (shorter history re-read on every subsequent turn)</li>
<li><strong>Total tokens saved:</strong> ~65 million</li>
<li><strong>Cost saved (21 days):</strong> ~$203 → ~$290/month projected</li>
<li><strong>Percentage of total spend:</strong> 7.8%</li>
</ul>
<h2>Why the savings compound</h2>
<p>This is the part people miss. Output tokens are expensive on their own — roughly $0.015 per 1K vs. $0.00075 per 1K for cached input reads. But the real bleed is that every verbose response becomes part of the conversation history, which gets re-read on every subsequent turn.</p>
<p>A 500-token response that could have been 250 tokens doesn't just cost you 250 extra output tokens once. It costs you 250 <em>input</em> tokens on every turn that follows, multiplied by however long the conversation runs. A ten-turn conversation pays for that verbosity ten times.</p>
<h2>Same information, 71% fewer tokens</h2>
<p>I ran the same question through the model before and after, to prove the rule wasn't losing information:</p>
<ul>
<li><strong>Before (450 tokens):</strong> Full paragraphs explaining what tokens are, how the context window works, what changed after cleanup, and separate sections on speed, cost, quality, and headroom impacts. Multiple headings.</li>
<li><strong>After (130 tokens):</strong> "Tokens = text chunks. Before: 25K loaded. After: 1,270. Speed / cost / quality / headroom — one line each." Same information.</li>
</ul>
<p>Reduction: 71%. Zero information loss.</p>
<h2>The bigger cleanup</h2>
<p>The brevity rule was part of a broader config hygiene pass. I also:</p>
<ul>
<li>Consolidated 36 AI rules into 10 (92% token reduction in the rules themselves)</li>
<li>Removed 13 redundant project docs from the context window</li>
<li>Purged 12 duplicate skill/plugin loads (~9,500 tokens per session)</li>
</ul>
<p>All-in, about 36,700 fewer tokens loaded per session <em>before</em> the model even looks at my code. Combined with the brevity rule, total savings landed around $500–600/month.</p>
<h2>The takeaway</h2>
<p>Cutting AI costs doesn't require a new provider, a smaller model, or clever prompt engineering. Most of the bloat is verbosity you never asked for. Six lines of instructions, a couple of afternoons of config cleanup, and you get most of the way there.</p>
<p>Every token your assistant writes that it didn't need to — you pay for twice. Once on output, and once every time it re-reads itself.</p>