Team AI

Thu, 2 Apr 2026

I’ve joined team AI (LLM coding agents). I’m by no means an AI evangelist. I’ve just decided that they provide productivity and quality improvements and plan to use them as the tools that they are. This post was fully written by me, a human.

For the past two months I’ve been trying out various things. Some worked, some didn’t. This is part one of two documenting my experiences and discoveries. This post is a listing of how I’ve used the bot and what I think works and doesn’t. The next post will be a collection of specific sections, phrases, and stanzas that I’ve added to my various AGENTS.md files that I think help wrangle improved behavior from a coding agent.

Ways to use a coding agent

In my experience, I’ve found the following,

Good uses:

Bad uses:

Code review

I’m a big fan of linters so using a coding agent for code review is going to be high on my list of good uses. Linters vary considerably in usefulness and relevance and I’d call the coding agent slightly better than average, or 7/10 if you want a score. Its hit rate on actual bugs is generally good but it can fixate on certain things and be extraordinarily pedantic at times.

Implementation planning

By “Implementation planning”, I do not mean allowing the agent to autonomously design an implementation. Instead, I mean letting the agent proofread and critique a design that I made. This generally looks something like,

I want to do BLAH, I plan to add a struct Foo { .... some fields ... } and wire it in via BLAH BLAH…. Does this sound reasonable? Do the proposed structures cover all reasonable situations? Any further recommendations?

The bot will look at the current code base and the plan and comment on it, usually with a specific list of issues it sees. Going through that list is generally a useful exercise before I start to code (without the use of AI). The process ends up saving time and making better code. It is like prototyping without actually writing the prototype.

Heisenbugs

The bot is particularly pedantic. This is useful at locating bugs. If you can give it details about exactly when a bug does and does not occur, the bot can find things you wouldn’t have noticed. I did not encounter any true Heisenbugs during my time with the agent so far, but it was particularly good at identifying what was going wrong in some odd cases. Its choice of fixes in these cases was almost always wrong, but fixing the bugs is generally the easy part.

Semantic patching

This is sort-of the light version of the next item, but worth breaking out separately even if the boundary between them ends up being a bit vague and ambiguous. By “semantic patching”, I mean the times when you decide you want to change a function to take Option<T> instead of just T. Or change the order of function arguments. Or decide that calling bar(foo(x)) is dumb and one should just call blop(x) instead. The agent can find all the places these things happen and fix them for you. It will find all the ways these things are done (e.g., x = foo(x); ... other stuff ...; y = bar(x)) and make a replacement that is most likely correct and probably the same change you would have made. Obviously you need to carefully read the diff, but now you are proofreading and fixing the one or two places things are particularly odd rather than trying to come up with a useful sed command and failing.

Busywork refactoring

Moving code around without breaking it is one of the things the coding agent seems to be quite good at. When you find some bit of common code that should be factored out into a general function with a few arguments the agent can do this, replacing many distinct call sites with the correct function call and arguments, customized for each call site. It’s really nice.

When you do more complex refactoring like this, I highly recommend you take the “Implementation planning” step above with the agent. You will want to address all the items it decides are issues before setting it loose. The process I found works well was,

  1. Specify the goal and the plan to achieve it.

  2. Ask the agent for issues and concerns it sees (just like above). It can help to explicitly tell it to list all issues it finds not just the top or most important ones.

  3. Ask it for clarifications to make sure you understand what its concerns are. If your plan changes in response, ask/tell it about the change and get its comments on the change. This can be done point at a time but some thought to order in which you discuss with the bot might be needed so that it doesn’t fixate on something you plan to change.

  4. Manually compact the context window. If you haven’t used AI agents, this is a command telling the agent to internally summarize what it knows so far. This frees up more room in its context window (its “attention”) for other work.

  5. In one big copy/paste prompt, explicitly give the command you want,

    Create a function foo with arguments … replace all sites doing XXX with calls to this function ….

    Decisions

    • Do not introduce a function that does WWW
    • When updating code that does YYY, do ZZZ

    In the decisions list address every concern the agent had. It doesn’t matter whether you accepted or rejected the agent’s concern, you just have to address the concern so the agent knows what to do when it sees what it thinks is a problem. In this prompt, make everything clear and explicit and leave little room for bot creativity.

Non-core competencies

The AI agent is very useful in non-core competencies. The agent can help me do things that would otherwise take a lot more time because I am not sufficiently familiar with the area. Since starting I have,

These are tasks that I am competent enough to tell when the fix is correct, but not familiar enough with the area to easily or in finite time get to a useful answer. The bot is able to propose ideas customized to my situation in ways that web searches can’t. It is also better at understanding that “no turning the mouse off and back on didn’t help” than a search engine is.

Derivative code generation

Second verse, same as the first… It certainly isn’t the case that most programming is just variations on a theme, but there is quite a bit of rhyming. A coding agent can be trusted with tasks like “implement a data object for this new database table” when the project already has an existing example or two. Similarly, creating components similar to, combinations of, or minor extensions of existing components are generally within reach of the agent.

Cold code generation

Where the coding agent breaks down is generating original new code. Unfortunately (or fortunately?) for Internet discussion forums, it is very hard to pin down “original” so people disagree over what this means. The agent appears to make original code when the task is a common pattern or mashup of patterns available in its corpus of training material. It also appears original when the task is such that the description to the agent is sufficiently revealing or detailed that it effectively tells the agent how to do it. The agents are very good at inference so this can sometimes appear magical or like the agent has more originality than it, in fact, does.

When given a task with sufficient implementation freedom, it tends to make very bad choices in those gaps. The agent breaks things down into smaller goals and then implements them. It has a memory of what it was working on recently but doesn’t generally scan the code base to make sure it isn’t re-implementing a function that exists elsewhere (though some AGENTS.md hacks might help a bit here). In one particularly bad session it was creating helper functions called format_finalize_date_for_payroll_report(). You can imagine how out of hand such a scheme might become.

The agent generally is overly cautious and afraid to break API. This is a great default, but means that it will almost always choose to write more code rather than modify or improve existing code. Specifically, I’ve seen no evidence that it would even consider renaming and reusing the above date formatting function if other uses arose calling for the same format. This means that if you set it loose on writing new features it will just add code for what it needs. If the agent is told about or happens to notice a related function the bot will use that function if the function’s behavior is an exact match, but any deviation between what that function does and what the bot wants and the bot will choose to write something from scratch. In my opinion, you simply can not set a coding agent loose on code you care about. This is the source of the term “AI slop” and it is real.

One strategy for helping an agent perform more complicated actions is to have it maintain an “ExecPlan”. This is a document which spells out specific milestones and implementation plans, for example, “### Milestone 2: Payroll schema design” followed by a planned schema, list of specific files to modify, meanings of the fields, …. The agent stores design decisions and reasons for decisions. It also maintains a self-modified journal of things it decides should be remembered. It will keep notes like,

Observation: there is already a canonical interval walker in punchclock/class.Timecard.php, but it is week-oriented and includes pseudo close-out handling for still-open punches.

Observation: root config settings can be used by punchclock code even though punchclock has its own config file.

It is rather impressive to watch it operate and the trick seems to work as intended. However, I have used ExecPlans in many ways (purely generated by AI, generated then modified by me, fully written by me, written then tweaked and improved between milestones). They do let the bot write larger bodies of slightly better code, but don’t significantly move the needle in cohesion or design capability, ExecPlans just let the agent do more than would fit in a single context window.

Strategization and design

The bot is an autocomplete. It can’t think for itself, all it can do is spit out code like other code it has seen before. In truth a human acts much the same way, but humans have spidey-sense that guides them in design. Any designs a coding agent comes up with will not be great – average is the best they can do and all of our children are supposed to be above average.

An interesting corollary of the autocomplete property / lack of individual personality is that the quality of code can be fairly variable depending on what you ask it to do and what language you are coding in. Specifically, I’ve found it to be a much better Rust programmer than a Perl or PHP programmer. Clearly this is due to the quality of its available training material. But also, if given a blank slate, its style will change depending on what the starting prompt is – one prompt may trigger it to think about one example it found on the net while a different prompt might trigger it to think of another, then the autocomplete runs off and can produce different style results. You can’t always tell how it will lean, but starting with leading questions can certainly help. E.g., ask some questions about secure parsing of untrusted data and then prompt for the task you want.

Conclusion

These are, of course, my experiences (hi!) at a particular moment in time (Feb/Mar 2026) with a specific coding agent (codex). I did work on several different projects with diverse tasks in quite a few programming languages (Rust, PHP, Perl, Python, JavaScript, and I’m embarrassed to admit, VB6). It took me several weeks to become sufficiently convinced of the utility before I moved from the web interface to running the CLI client (sandboxed – but that’s yet another post).

I have successfully let the model write some code designed during some “Implementation planning” sessions. I’ve also unsuccessfully let it write code. At this time, I will only allow it when I decide that the implementation will end up being small, localized, and I have a pretty good idea what I think the AI will actually do. My suspicion is that this will be the area where I personally see progress as better models come out. It is the area where the bot is clearly able to do something, but isn’t good enough. I will be able to allow them to make larger and larger changes, after having specced out the planned code structure. In my opinion, open code generation of the form “write an app or feature that does BLAH” is so far from competency that incremental improvements won’t get us there. Yes, the bot can implement that app or feature, but the code will degrade into (if it doesn’t already start out as) a mess of unmaintainable slop as the code is updated and more features get added.

Code generation gets a lot of the AI agent press, but it is the other uses, before and after the code is written, where these bots actually earn their keep and add meaningful value.

Set Zero

Home

2026

2024

2023

2022

2021

2020