AI costs

The Brevity Rule: Cut AI Costs ~10% With One Config Change

Six lines of instructions, zero information loss, and nearly 10% off my AI bill. The exact rule, the numbers behind it, and why verbose responses bleed budget twice.

Ralph DuinMarch 20, 20263 min read

<p>My AI bill was climbing every month. Not because I was using more AI — because every response was longer than it needed to be. One six-line config change cut my spend by nearly 10% without losing a single piece of information. Here's exactly what I did.</p> <h2>The rule</h2> <p>I dropped this into my project's system instructions:</p> <pre><code class="language-markdown">1. Default to minimum viable response. Bullets over paragraphs. Tables over lists when there are 3+ items. 2. Never drop information — compress, don't omit. 3. If the user asks "why" or "explain", expand only that part. Stay terse everywhere else. 4. No filler phrases: "Let me", "Here's what happened", "Great question". 5. Never restate the user's question. 6. For code references: path + line range only, no re-quoting.</code></pre> <p>Six lines. No other changes. Let it run for three weeks and measured the delta.</p> <h2>What the numbers looked like</h2> <p>Baseline sample from my Cursor usage export:</p> <ul> <li><strong>Requests:</strong> 5,479</li> <li><strong>Total tokens:</strong> 5.05 billion</li> <li><strong>Avg output tokens/request:</strong> 3,949</li> <li><strong>Avg cost/request:</strong> $0.48</li> </ul> <p>After applying the brevity rule:</p> <ul> <li><strong>Output tokens saved:</strong> 10.8 million</li> <li><strong>Compounded input savings:</strong> ~54 million (shorter history re-read on every subsequent turn)</li> <li><strong>Total tokens saved:</strong> ~65 million</li> <li><strong>Cost saved (21 days):</strong> ~$203 → ~$290/month projected</li> <li><strong>Percentage of total spend:</strong> 7.8%</li> </ul> <h2>Why the savings compound</h2> <p>This is the part people miss. Output tokens are expensive on their own — roughly $0.015 per 1K vs. $0.00075 per 1K for cached input reads. But the real bleed is that every verbose response becomes part of the conversation history, which gets re-read on every subsequent turn.</p> <p>A 500-token response that could have been 250 tokens doesn't just cost you 250 extra output tokens once. It costs you 250 <em>input</em> tokens on every turn that follows, multiplied by however long the conversation runs. A ten-turn conversation pays for that verbosity ten times.</p> <h2>Same information, 71% fewer tokens</h2> <p>I ran the same question through the model before and after, to prove the rule wasn't losing information:</p> <ul> <li><strong>Before (450 tokens):</strong> Full paragraphs explaining what tokens are, how the context window works, what changed after cleanup, and separate sections on speed, cost, quality, and headroom impacts. Multiple headings.</li> <li><strong>After (130 tokens):</strong> "Tokens = text chunks. Before: 25K loaded. After: 1,270. Speed / cost / quality / headroom — one line each." Same information.</li> </ul> <p>Reduction: 71%. Zero information loss.</p> <h2>The bigger cleanup</h2> <p>The brevity rule was part of a broader config hygiene pass. I also:</p> <ul> <li>Consolidated 36 AI rules into 10 (92% token reduction in the rules themselves)</li> <li>Removed 13 redundant project docs from the context window</li> <li>Purged 12 duplicate skill/plugin loads (~9,500 tokens per session)</li> </ul> <p>All-in, about 36,700 fewer tokens loaded per session <em>before</em> the model even looks at my code. Combined with the brevity rule, total savings landed around $500–600/month.</p> <h2>The takeaway</h2> <p>Cutting AI costs doesn't require a new provider, a smaller model, or clever prompt engineering. Most of the bloat is verbosity you never asked for. Six lines of instructions, a couple of afternoons of config cleanup, and you get most of the way there.</p> <p>Every token your assistant writes that it didn't need to — you pay for twice. Once on output, and once every time it re-reads itself.</p>