50 First Prompts

TL;DR: LLMs do not remember anything between calls. Every “conversation” you’ve ever had with one was reconstructed from scratch by replaying history into the context window. If your architecture treats memory like a feature you turn on, you will pay for it twice: once in token spend, and once in the slow erosion of consistency that has your users playing Henry Roth, re-establishing context every morning so Lucy can function. And yes, I often use humorous analogies, so please subscribe or follow (or un-) according to your tastes.


If you have not seen 50 First Dates, the premise is that Lucy Whitmore (Drew Barrymore) wakes up every day with no memory of anything that happened the day before, and Henry Roth (Adam Sandler) has to remind her of their entire relationship, every morning, forever. Sweet movie. Terrible AI pattern (in most cases).

True story: when I went to see this one in the theater, the projector died about twenty minutes in. It was weeks before we made it back to finish it, and the second viewing had this faint déjà vu quality, the film meeting me halfway while I reconstructed the rest from a partial memory. Something humans do automatically (if unreliably) and LLMs can’t, at least on their own.

The movie plot is also a reasonable analogy of how a Large Language Model works under the hood. The LLM is Lucy. Every developer who builds on top of it is Henry. Every API call is the first call. Every conversation is reconstructed from a transcript that the application hands the model on the way in. The model itself remembers nothing. The illusion of continuity is something your application is doing on its behalf, on every turn, at your expense.

Most teams do not build for this. They build as if “the AI” remembers things, get surprised when it doesn’t, bolt on a memory layer that is tested like a deterministic automation, and then watch their token bill quietly compound. We’ve all heard some horror stories about this happening. It’s why enterprises prefer to use vendor tools and outside consultants. Which is a good way to get up and running, but has its own cost if the relationship isn’t built on trust and reciprocal ROI.

The Architecture Reality Behind the Humorous Analogy

LLMs are stateless. Full stop. The model is a function: tokens in, tokens out. Whatever “memory” you experience in ChatGPT, Claude, Gemini, or your own agent is some other system managing the flow of prior context back into the prompt before the model sees it.

This has three implications that drive everything else:

First, there is no “the conversation.” There is a transcript that gets re-sent every turn. The model is not pulling up your last message; you are handing it back, every time.

Second, the context window is the entire universe of what the model knows in that moment. Anything not in that window does not exist. Anything in that window is being paid for, in tokens, on every single call.

Third, “memory” in vendor marketing rarely means one thing. It is a category that includes at least five different mechanisms with different costs, different failure modes, and different retrieval semantics. Conflating them is how you end up with an expensive system that still forgets the user’s name. There are, however, better ways.

Memory Is a Marketing Word

When a vendor or framework says “memory,” they could mean any of the following, and the differences matter:

Conversation history replay. The full transcript, prepended on every call. Simple, perfect recall, terrible cost curve. Linear in turns, eventually crashes into your context limit.

Running summary. A compacted version of the transcript, regenerated periodically. Cheaper, lossy, drifts over time. The model is now reading its own paraphrase of what happened, with all the small infidelities that implies.

Vector retrieval (RAG over chat history). Past turns are embedded and indexed; only relevant snippets get pulled into the next prompt. Cheap, scalable, but only as good as your embeddings and your retrieval thresholds. It will confidently fail to surface the one thing the user expected it to remember.

Structured profile / entity store. Key-value or graph storage of facts about the user, product, or domain (“user’s tone preference: dry,” “preferred billing currency: USD”). Cheap to read, easy to audit, but only as good as the extraction logic that populates it.

Procedural / skill memory. Instructions, playbooks, or skills the agent loads on demand. Closer to “here is how we do things here” than “here is what you said yesterday.” Different beast entirely.

A reliable and practical AI memory architecture uses several of these in combination. A bad one picks one and pretends it covers everything. If your team is having an argument about “should we add memory,” the real argument is which of these five you are talking about; why it is the best choice in a given context; and when the context and best option changes.

What Lost in the Middle Actually Costs You

Even if you stuff the entire history into the context window, you do not get what you think you are paying for. Liu et al. at Stanford published Lost in the Middle: How Language Models Use Long Contexts in 2023, and the finding has been replicated enough times that it should be a load-bearing assumption in any architecture: model attention is not uniform across the context window. Information at the beginning and end gets used. Information in the middle gets quietly ignored, even by models that advertise long-context support.

So the naive “just give it the whole history” approach is doubly bad. You pay for every token, and the model uses some of them less than others, and you have no easy way to tell which.

This is one of the reasons selective retrieval beats full replay almost everywhere. You are not just saving tokens. You are putting the relevant tokens in positions where the model will actually use them.

The Token Bill (Yes, Again)

Here is the part that gets glossed over in the demos.

Every token in your context window is paid for, every turn. If your “memory” is “we keep prepending the full conversation,” then by turn 50 you are paying for tokens 1 through 49 fifty times over, and the model is working harder to find the signal each time. This is the closest thing to a structural cost trap in LLM architecture, and it is almost always invisible in development because nobody runs 50-turn conversations against the dev key.

Anthropic’s prompt caching, introduced in August 2024, helps for the parts of your context that genuinely repeat (system prompts, fixed instructions, large reference documents): cached read tokens cost about 10% of the standard input price. That is real money saved on the parts that don’t change. But caching is not memory. It does not summarize, retrieve, or forget. It just makes paying for the same prefix cheaper. Use it where it fits, but do not let “we turned on caching” stand in for an actual memory strategy.

Memory architecture is cost architecture. They are the same conversation. Any team treating them separately is going to be surprised by one of them.

Patterns That Actually Earn Their Keep

A few that hold up in production (as of this writing, a caveat that I’m guilty of not always stating, and how you should think about everything you read about AI):

Hierarchical / paged memory. MemGPT (Packer et al., 2023) is the canonical paper here: a small “main context” of hot facts plus a larger “external context” the model can page in and out, modeled on operating-system virtual memory. Even if you never use the framework (now continued as Letta), the mental model is the right one. Most context is cold most of the time. Stop paying to keep it warm.

Compaction at boundaries. Summarize aggressively at natural breakpoints (session end, topic change, day rollover). Throw away the verbatim transcript once the structured summary is written. Track what got compacted so you can audit later if a user complains the model “forgot.”

Structured extraction over raw recall. Pull stable facts (preferences, identifiers, decisions) out of conversation into a structured store. Read those on every turn. Let the conversational history age out. The user’s preferred tone of voice does not need to live in 12,000 tokens of transcript.

Retrieval over replay. Index past turns, retrieve only what is relevant to the current input, accept the occasional miss as a cost of doing business. Tune your retrieval thresholds with the same seriousness you tune any other production query.

Skills and procedural memory as a separate tier. “How we do things” is not the same as “what we said.” Keep them in separate stores with separate update rules. Skills change rarely; episodic facts change constantly.

A Practical Framework

Four scenarios, four answers:

A user opens the same chat tomorrow and expects continuity: structured profile plus retrieval over summarized history. Do not replay the full transcript.

An agent loops on a long-running task: hierarchical memory with compaction at step boundaries. Hot working set stays small; cold context pages out.

A system prompt or large reference document is reused on every call: prompt caching. Cheap, easy, do it today.

A model needs to “know how we do things”: procedural / skill memory in its own tier. Keep it separate from episodic memory so updating one doesn’t disturb the other.

The wrong answer in all four cases is “just send the whole history.” That is the architecture equivalent of walking Lucy through the entire relationship from scratch, every morning, in hopes that this time some of it sticks. Romantic in the movie. Expensive in production.

Paddling off into the Sunset

The model forgets. That is not a bug, that is the current limitation of the art. The work is in deciding what your application remembers, where it stores it, when it retrieves it, and what it costs you per turn. Treat memory as architecture and most of the surprises go away.


Sources:

If you found this interesting, please share.

© Scott S. Nelson

Markup is the New Markdown

TL;DR: “HTML is the new Markdown” is an attention-grabbing headline (for some of us), but not something to adopt at face-value, or without more context. Where HTML applies, it genuinely delivers. Where it doesn’t, you’ll just be paying more per token for the privilege of being wrong.


Here’s something I catch myself doing constantly with AI content: skim the headline, fill in the details based on my own context, and run with a conclusion the original post may or may not have intended. I’m not the only one. It’s not laziness, it’s a cognitive defense mechanism in a world full of content, not to mention extra work hours keeping up with all the tools that are supposed to save us work hours.

This one is definitely that.

Earlier this month, Thariq Shihipar, an engineer at Anthropic, posted nine words on X: “HTML is the new markdown.” The post linked to a companion site with 20 self-contained .html files that an agent produced instead of the usual Markdown output. It pulled 8,600+ likes and 11,000 bookmarks. Simon Willison publicly reconsidered his three-year Markdown default. The Hacker News thread was climbing past 30 points an hour.

The reaction was big and fast, which is a sign that many people didn’t read the whole thing, or understand the whole context (ahem, “vibe coding).

So. Let’s look at what this shift actually means, where it applies, how to apply it, and what it does to your token bill.

My HTML Baggage (Relevant, I Promise)

I mastered HTML in the early 2000s. Semantic structure, tag vocabulary, clean markup from scratch. The whole deal. So when Markdown started getting traction among developers in the 2010s, my first reaction was skepticism: why learn a format whose entire job is to produce a subset of what I can already write directly?

The pragmatic case eventually won me over, and it wasn’t even close. By the early 2020s, documentation had moved decisively into Git. Design docs, specs, ADRs, READMEs, changelogs: all .md. GitHub renders it natively. It reads as plain text, commits cleanly, and it became the shared syntax of developer collaboration. Fighting it meant fighting a current that wasn’t going to reverse, so I stopped. The ubiquity was the feature.

I tell you that because when I saw Thariq’s post, I had already settled in mind that markdown is how to communicate with AI and this made no sense to me.  Back to that bad habit of skimming headlines. What I should have done first was ask: back for what?

What Thariq Was Actually Pointing At

The argument isn’t that Markdown is dead or that you should rewrite your documentation in HTML. It’s narrower and more specific than that.

Thariq’s 20 examples grouped HTML wins into categories of LLM output: project status reports, code reviews, diagnostic summaries, data comparisons. The things an agent produces that a human then has to read, navigate, and act on. When one researcher ran all 20 prompts through Claude in both formats, HTML won 17 of the 20 head-to-head comparisons. The 3 cases where Markdown held its own were tasks where the output stays internal to an agent’s loop and never reaches a human at all. (Source.)

Once a person is the end consumer, HTML’s richer vocabulary starts earning its overhead. Collapsible sections. Semantic structure. Tabbed layouts. Inline labels. Color-coded status. Things Markdown has no syntax for, because Markdown was never designed to produce navigable deliverables. It was designed to produce readable plain text.

LLMs have also been trained on billions of HTML pages, so the semantics of those tags are deeply embedded in how these models understand and produce structure. That doesn’t go away just because Markdown became the default output convention.

For human-readable LLM output, HTML deserves a serious look. That part of the headline holds up.

Where It Does Not Apply

This is where the skimming gets expensive.

For input to an LLM, Markdown is still the right default, and by a wide margin. Markdown uses dramatically fewer tokens than HTML for equivalent content. A Cloudflare analysis found that the Markdown version of a typical blog post used 80% fewer tokens than its HTML counterpart. In RAG pipelines, Markdown-formatted inputs have been shown to boost accuracy by up to 35% while cutting token costs by 20 to 30%. On structured tasks like table extraction, Markdown outperforms HTML at roughly 60.7% accuracy versus 53.6% in GPT-based evaluations. (Source.)

Worth noting: Profound ran a controlled experiment across 381 pages on 6 websites to test whether serving Markdown to AI crawlers versus HTML made a meaningful difference in bot traffic. The result was a marginal directional advantage for Markdown (~16% mean lift) that wasn’t statistically significant. (Source.) Which is to say, well-formed HTML isn’t incomprehensible to LLMs. But when you’re paying per token, the math still favors Markdown clearly.

For documentation in repositories, nothing about Thariq’s observation changes the picture. Markdown’s native rendering in GitHub and GitLab, its readability as plain text, and its role as the standard syntax of developer documentation are not touched by this argument. If your docs live in Git and humans need to read and edit them, Markdown is still the answer. Full stop.

The Token Bill Reality

This deserves its own section because it’s where the “HTML is back!” take gets most dangerous most quickly.

The token efficiency gap between Markdown and HTML is real and large. 80% fewer tokens for equivalent content isn’t a rounding error. At any meaningful scale, that’s a direct line to your API costs. HTML earns that overhead only when the output is rich enough, and human-facing enough, to justify it.

If your workflow involves long context windows, high-volume RAG retrieval, or large amounts of text being ingested or passed between agents, the format you choose for that content has a real cost consequence. Thariq’s post is not an argument for switching to HTML across the board. Applied without that nuance, it’s an expensive misread.

The Framework That Actually Helps

Four scenarios, four answers:

A human writes context and feeds it to a model: Markdown. A model produces output that stays inside an agent loop: Markdown. A model produces a deliverable a human will read, navigate, and act on: HTML is worth the token cost. Documentation lives in a repository: Markdown, full stop.

The headline “HTML is the new Markdown” is accurate for exactly one of those four. The other three haven’t changed.

Thariq’s post isn’t a verdict. It’s a recalibration for a specific use case. The fact that it spread the way it did says less about the content and more about how hungry people are for permission to do the thing they already half-wanted to do.

I’m not pointing fingers. I had the same instinct.

Additional Sources:

If you found this interesting, please share.

© Scott S. Nelson

Attitudes About AI Adoption and Acceleration

Much of the misplaced fear and distrust surrounding AI adoption traces back to a single omission in how people are often introduced to its use. Businesses and the media have fixated on the intelligence aspect while often ignoring the behavioral framework required to make it work in the real world.

The early representation of Generative AI suggested it was a shortcut that required very little effort. If users were told upfront about the level of detail, context-setting, and iterative refinement required to get a usable result, the hype might have been quieter (look how long Anthropic was off the radar of the general public), but the real work with these powerful tools might have started sooner for the average person and business (AI Adoption Puzzle: Why Usage Is Up But Impact Is Not, BCG, 2025)

We are essentially trading traditional coding hours for what some call vibe coding: throwing natural language at a problem and hoping the model catches the intent. Vibe coding is a legitimate way to prototype, but it becomes technical debt if you do not eventually solidify the logic. Replacing a clean specification with an open-ended series of guesses is how projects lose their shape before they find their footing.

The most effective approach is not simply plugging a model into an existing process because it looks like it might help. Genuine acceleration comes from a willingness to rethink how things get done, then determining how AI can facilitate those better ways. It is the difference between automating a flawed process and designing a new one.

The success stories often come from teams who looked at a failed output and wondered what specific lever they forgot to pull. They treat the model as a mirror. If the output is off-base, it usually means the instructions provided were incomplete or lacked the necessary constraints. It is an objective way to see where our own requirements are fuzzy.

This is particularly evident in workflow automation. Earlier automation projects often failed because they only mapped the mechanics. We drew boxes and arrows to show what happened next, but we ignored the intent.

AI-driven automation is succeeding where those attempts fell short because the machine requires the reasoning, not just the step. To make an agent navigate a workflow, you have to document why each step exists. This forces organizations to complete their process definitions rather than paper over the gaps. If you cannot explain the logic behind a decision point, the machine cannot execute it. This forced clarity is the real process improvement.

The Double Standard

There is a noticeable double standard in the modern workplace. When an LLM returns a hallucinated mess or fails a logic branch, we iterate. We refine the prompt. We provide more context. We give the machine a level of professional grace and patience that we rarely extend to our human peers.

Think about what that looks like in practice. A new team member submits work that misses the mark, and the first instinct is to question their judgment or capability. The same output from a model and the instinct is to wonder what context was missing from the prompt. One is treated as a character flaw; the other as a specification problem. They are often the same problem.

If organizations applied that same diagnostic instinct to people, treating an incomplete first draft as a gap in the brief rather than a gap in the person, productivity would likely increase. Instead, we frequently demand accuracy on the first pass from humans while subsidizing the machine’s learning curve with endless retry clicks. (The Human Side of AI Adoption: Lessons From the Field, MIT Sloan Management Review, 2025)

The Same Loop Applies to Both

Closing that gap is not primarily a technology problem. It is a management problem, and the same loop applies whether you are working with a model or a person.

Start by acknowledging that a wrong answer is often a sign of a logic path being tested; it is data, not a failure. Reward the attempt at solving the problem; in early iterations, the goal is narrowing the scope, not delivering the final answer. And when the output is off-base, assume the cause is a lack of clear boundaries before assuming incompetence. These are not novel management principles. They are just easier to see when the thing being managed cannot take it personally.

The teams getting real value out of these tools are not looking for a magic button. They treat the AI as a diagnostic tool for their own process gaps. They do not just want the answer; they want to see where the system broke so they can fix the underlying logic.

The One Attribute That Survives

This brings us to the attribute that determines whether a tool gets abandoned or mastered.

Curiosity is the only attribute and attitude that survives the hype cycle.

Expectations without curiosity lead directly to disappointment. If you aren’t wondering why the model failed, you will just conclude the tool is broken and move on. In a technical context, curiosity is the bridge between a strategy and a result. It leads to both the perseverance and the openness to changing the way we think about how things get done. It forces us to reprioritize the work based on what the machine reveals about our own internal logic.

Proficiency in this landscape is not about mastering a specific toolset, because those change every few weeks. It is about an underlying hunger to understand the mechanics of the work. If you have that curiosity, you will find the ROI because you will keep digging until the logic is sound.

Until next time…


Related reading from What IT Is:

If you found this interesting, please share.

© Scott S. Nelson

Efficient Claude Usage with Microsoft Word Documents

The recent increase in usage limits has done what scarcity always does: it has made people pay attention to process. Suddenly, everyone wants to know how to get more out of Claude without burning through context on avoidable nonsense. The answer, at least for Word-heavy work, is simple enough: do the real work in Markdown, then convert to Word at the end.

That is not a trendy preference. It is basic resource management.

Start in Markdown

The common mistake is to treat Word as the working format just because the deliverable has to be a Word file. That feels tidy right up until Claude spends time and tokens dealing with DOCX structure instead of the actual writing. Claude’s own guidance is plain about the underlying constraint: “Claude’s context window fills up fast,” which is polite model-speak for “stop wasting it” (Best Practices for Claude Code, Anthropic).

Markdown avoids most of that overhead because it is plain text. That matters more than people want to admit. If you want Claude to help you think, draft, revise, and restructure, give it the cleanest working copy possible. Save Word for the part where human beings still insist on headers, footers, styles, and other ceremonial labor.

DOCX Costs More

A DOCX upload is not the same thing as a Markdown file with a fancier extension. It has to be unpacked and interpreted, and that processing overhead shows up in usage. Perplexity’s file-upload guidance makes the point indirectly but clearly enough: large files are handled by extracting the useful content, not by preserving some magical full-fidelity document soul (What We Can’t Do, Perplexity).

The practical difference is not subtle. Published guidance on Claude usage reports that Markdown can use roughly 65 to 90 percent fewer tokens than DOCX for the same content, and that converting to Markdown can reduce token usage by up to 90 percent (How to Convert Files to Markdown to Reduce AI Token Usage by Up To 90%, MindStudio). That is the sort of delta that turns a usable workflow into a quota sink. If you are doing any amount of iterative editing, those savings compound quickly.

The Plugin Assumption

It is natural to assume the Word plugin should be more efficient than uploading a Word file. It lives closer to the document, it feels integrated, and it avoids the mental friction of exporting and reimporting. That is a comforting theory. It is also the kind of theory people keep right up until the bill arrives.

The problem is that convenience does not eliminate document complexity. Claude still has to work within the same context constraints, and Word still produces the same heavy DOCX baggage underneath the friendly interface. The plugin may be better ergonomically for some editing tasks, but it is not inherently cheaper in usage terms (Best Practices for Claude Code, Anthropic). Word is very good at making labor look civilized. It is less good at making it light.

The Practical Workflow

The better approach is boring, which usually means it will actually survive contact with production. Draft in Markdown. Edit in Markdown. Use Claude on the Markdown version while the content is still moving. When the document is stable, export to DOCX and do the final polish in Word.

Pandoc is the clean bridge between the two. It can convert Markdown to DOCX and back again, and its --wrap=none option is useful when converting Word to Markdown because it avoids unnecessary line wrapping noise in the source text (Pandoc User’s Guide, John MacFarlane). That makes the round-trip easier to read, easier to diff, and easier to hand back to Claude without feeding it a bunch of formatting clutter.

A sane process looks like this:

  1. Convert the existing Word document to Markdown with Pandoc.
  2. Use Claude to edit the Markdown version.
  3. Convert the finished Markdown back to DOCX with Pandoc.
  4. Open the DOCX in Word and apply the styling, branding, and layout cleanup that Word insists on being asked for last.

For existing documents, the same rule applies. Convert the Word file to Markdown using Pandoc with --wrap=none, make the changes in Markdown, then convert back when you are done. If the edits are small, copy the updates back into the styled Word file. If the edits are substantial, rebuild the document from the Markdown and stop pretending the old layout deserves to survive unchallenged.

Where Pandoc Belongs

Your earlier post on Pandoc is the natural starting point for anyone who needs the setup and wants the shortest path to something useful: Boost Your GenAI Results with One Simple (and Free) Tool (me). That is the right companion piece here, because the real point is not “use Word better.” The real point is “stop making Claude pay the Word tax until you absolutely have to.”

The rule of thumb is simple. Do the thinking in Markdown, use Pandoc to convert at the edges, and let Word handle the final cosmetic work. If you start with DOCX, you are usually paying more than you need to. And Word, as always, will be happy to take the fee.

If you found this interesting, please share.

© Scott S. Nelson

Perspectives in Spec Driven Development

There’s this great weekly online morning meetup I join when I can called “The Secrets of Product Management”, led by Nils Davis. Recently the topic of Spec Driven Development came up.

Full disclosure: I didn’t take notes in the meeting and there were a lot of concepts and thoughts shared verbally and in the chat. Some of what I recall may be off, and I hope that if anyone present reads this and has a better recall that they share their thoughts in the comments.

Some thought it was about a Product Manager gathering all of the specifications for a product in advance, and that it led back to waterfall style processes.

Some thought it was building a Proof-of-Concept to serve as the specification.

By the end of the discussion, the one thing everyone (mostly) agreed on is that it works much better when done iteratively, and includes direct references to standards.

As an architect who still codes, my understanding of SDD is that it is about the spec files that are carefully crafted to direct generative AI in how to write the code. It is a way to get better code from the AI that will require less refactoring after the first results.

The different perspectives made me think it was worth doing a little research and summing it up for my reader here. I admit to mentally vacuuming up a lot of content about AI in order to feed my own synthesis on its use, and the key thing that I saw differently was the ownership of the specification used for SDD.

When I presented the question “Who owns the spec in spec driven development?” to an AI, it responded with “…humans own the spec…”, which points out a whole new perspective.

So, that’s what drove me to dig in a little bit to improve my own understanding and share the results.

A Quick History Lesson

Like most things in IT, the earliest signals appear long before what we later label as “modern” computing (a term that conveniently tends to align with when each of us personally got excited about technology). As far back as 1987, Managing the Development of Large Software Systems: Concepts and Techniques outlined ideas that closely resemble what many now think of as Specification-Driven Development (SDD). Interestingly, its diagrams reflect structures similar to waterfall methodologies (an ironic reminder that many “new” ideas are refinements of older patterns rather than entirely novel inventions).

These concepts did not evolve in isolation. Over the following decades, they were reinforced by related disciplines such as formal methods and API design principles like Design by Contract* (which emphasized precision, verifiability, and clearly defined interfaces). Later, approaches like Behavior-Driven Development (BDD) carried some of this thinking forward, framing specifications as shared artifacts between humans and systems (but still largely as guidance rather than execution).

What has changed more recently is the role of AI in making specifications actionable. Around 2025, tools began to emerge that transformed specs from passive documentation into active drivers of implementation. Projects like AWS Kiro and GitHub’s spec-kit marked a shift. Specifications became executable guides for coding agents (not just references for developers). In this sense, “modern” has continued to compress (moving from spanning decades to evolving almost in real time), as specs shift from descriptive artifacts to operational components of the development process.

Opinions Still Differ

I don’t think my input in the recent conversation changed anyone’s mind about how they define SDD. And people will definitely have strong opinions on the value of SDD.

In a recent post, Allen Holub said:

“People talk about spec-driven design, but the best spec you can have is a test—a test you write before you write the code. You don’t write a test to see if the code adheres to a spec. The test IS the spec. Don’t write specs. Write tests.”

I agree with TDD proponents, because it is part of a Continuous Testing cycle, a process that was just starting to really catch on before GenAI went GA, and is even more important since. That said, tests are part of the spec, they are just managed a little differently because the developer doesn’t happen to be human. That’s the whole point of SDD. It is how developers work with agents through clear communication. Because, let’s face it, the Agile approaches of sitting with a user won’t work with AI until after the code has been written, and pair-programming with an AI was only modern for a moment.

Helpful Tools to Try

Tools make this less painful than it sounds.

GitHub’s spec-kit is a good entry point. It gives you Markdown templates for a “constitution” file with principles, then spec.md, plan.md, tasks.md. You slash-command it in your IDE, and AI fills in the gaps. They put it well: “The specification captures intent clearly, the plan translates it into technical decisions.” ([GitHub Blog], Spec-driven development with AI: Get started with a new open-source toolkit) Amazon’s Kiro does staged workflows, Tessl flips code to byproduct. Red Hat talks up “lessons learned” files to feed back into future specs, cutting errors over time ([Red Hat Developers], How spec-driven development improves AI coding quality).

Wrapping Up

All in all, my sense is to treat specs like your IaC or database schemas. Human owned from the start, iterated carefully, governed with some structure. Reference standards to ground it. Try it small, on a utility script maybe, and see how it holds up in real work.

If it fits your flow, it can add real velocity with AI. If not, no big loss; plenty of paths forward.

*Side note: Yes, I usually have these inline in parenthesis (a habit my AI editors hate), but this one seemed too long for that, so… I did some research with Gemini where it insisted on a correlation between design by contract and spec driven development, which at first I took to mean it prefers its training data rather than current information, so I switched to my usually research LLM wrapper, Perplexity. After some hind-brain thinking, it occurred to me that Gemini may have semantically equated specification with contract, which is another quirk of AI: it is so darn literal!

If you found this interesting, please share.

© Scott S. Nelson