Pavel Panchekha

By

Share under CC-BY-SA.

How to Start AI Coding

On the one hand this is the most hyped topic ever so you'd think I have nothing to add. I don't feel that I do. But I'm known as the "AI Coding hype man" in my department and among many friends. They keep asking me how they're supposed to do AI coding. This blog post is an attempt to save time explaining.

If you are happy with your AI coding setup, please just skip this blog post. This is an opinionated guide to AI coding. It is optimized toward taking skeptics and getting them to try it; I don't claim this is the "right way" to do AI coding, just one that I've found convinces fence sitters. The opinionatedness is part of that, it cuts through confusion by purposefully oversimplifying.

Types of AI Coding

Autocomplete: One way to do AI coding is to use Github Copilot or Cursor Tab or one of these other features where there are AI-generated auto-completions in your text editor.

The big challenge is that auto-complete has absolutely brutal latency requirements, tens of milliseconds at most. You just can't run one of the big modern smart models in that time, nor can you make a network request, so you're stuck with small dumb local models.

So basically, it's fast but dumb. I guess that's to taste. I turn it off, but some friends like it. It's great at adding print statements when debugging; it's less useful for logic and can make subtle mistakes. It might be more useful in common and more boiler-plate-y languages.

Chat: Another option is to copy and paste code into the ChatGPT UI. You may laugh but 1) a lot of people do this, and 2) it works better than auto-complete. Here you are using the large modern smart models and can get very complex code written to reasonable quality. You can do this with any chatbot: GPT, Claude, Gemini, and so on.

The big challenge is that, in a chat interface, the AI has the ability to write code but minimal ability to run it.1 [1 This depends on the chat UI; ChatGPT for example seems to be able to run a few languages sandboxed in the background but only in paid accounts. (Maybe?)]

I think of it as "whiteboard coding". You ever do an interview somewhere like Google, and they asked you to write A* on the whiteboard? You probably did a decent job but that code might not actually run if you transcribe it literally. Just like you would do better if you could actually run the code, read error messages, and so on—well, same for the AI.

Agents: An AI "agent" is just a for loop. There are two parts to the loop: the AI model and the "agent harness" that it talks to.

In each iteration of the loop, the AI generates a "tool call" to one of a number of tools it has access to. More specifically, it generates a token naming the tool and then more tokens describing the arguments to the tool, usually in JSON format. Common tools are things like "edit file" (arguments: a file name and a patch file) and "run command" (argument: a shell command). Then the "agent harness" executes the tool, produces output, and that output becomes part of the AI's context when it starts the next iteration of the loop.

Agents work well because they can mostly develop code the way people (Unix greybeards) develop code: by writing code, compiling it, running it, observing the output, and making adjustments. Modern AIs are surprisingly capable with the shell, which gives them a lot of capabilities right off the bat (they can grep through code bases, examine output files, tweak compiler flags, run Make, analyze JSON files with jq or just by writing Python scripts, and so on). But the AI can also do stuff like add print statements, run the code, look at what was printed, and repeat. Just like this helps you develop—it's easier to add print statements than accurately simulate program execution in your head—it also helps the AI.

Anyway, across these three options, you want the good shit, you want an AI agent.

Choosing an AI agent

First, pick a vendor. Pay them money. If you've finished college, can program, and are in the US, you probably have $20.2 [2 If that's not you, try Amp Free, which is ad-supported and uses weaker models but mostly does work.] Just pay them the money. This is a huge hang-up people have but get over it.

You probably know all the vendors' names, but just to put them in one place:

They're all fine. People have preferences, and sometimes one or the other pokes ahead a bit, but you're not missing out on much by picking the "wrong" one. I use one called Amp,3 [3 It currently uses Anthropic models.] which I do like, but I wouldn't be hard put upon if I had to switch. I often use Codex in demos because my university offers OpenAI.

Note that you want the CLI version. Some of these also have "Cloud" versions; avoid.4 [4 They're fine but quite fiddly and I find it's valuable when you're still getting the hang of it to watch what the agent is actually doing. Most people find it significantly increases their confidence in the agent.] The CLI version can run commands on your code base, directly on your computer, and directly observe the results. It's the closest match to how you work and also it's the easiest to collaborate with.5 [5 For example, maybe you want to clean up some of the variable names before you commit or something; that's often easier to just do than to explain to it.] You can also use these tools from VS Code, by the way, they work just as well there. The important thing is that you're using 1) a full-fat frontier coding model, and 2) it runs on your machine. To be clear, the model doesn't run on your machine (your GPU is too puny). The agent harness runs on your machine and can interact with it. That's what matters.

Is running the agent on your computer safe? What if it executes bad commands? Is it a security risk? Nah. I mean, in theory, I don't know, but I've never had it do something bad. Or, just try it! Run one of these agents and try to convince it to run rm -rf /. Maybe it's because I use the Anthropic models but I haven't succeeded. Also the agent harness will ask your permission before running commands that aren't whitelisted. I usually turn this off, the models are safe, but it's there too.

What about privacy? Will it exfiltrate your precious data? Ok, first of all, you probably work on open source stuff anyway, or if not open source probably not national secrets at least. Second, some of the paid versions promise not to train on your data, whatever that's worth. Would local models be better? In theory, but in practice they're not very good.

Using it

Ok, so now you've paid your vendor and downloaded the CLI version. Let's get started using it.

Pick a coding project you work on. They're great at everything. Front-end, back-end, C++, Racket, Haskell, data analysis, etc. Don't worry about it. Any language, any goal, whatever. Pick a coding project.

Make sure you have the code in git or similar and make sure you have a clean checkout. While the agents are smart they do sometimes make ill-considered changes, and you'll want to review the changes and possibly roll them back. Some people make a separate copy of their repo for the AI agent to play in; you can do that too, but make sure the repo is ready for work, like, it's compiled and the tests run and so on. For a first attempt, easier to just use where you work, because you know that that's working.

Now start the agent, usually by opening your terminal, changing to your project's directory, and typing the name of the tool, like codex or claude or gemini. Do the login flow. Now you should see a box that you can type into.

Type into it and tell it what to do. "FooBarFactory seems excessive, see if you can refactor the code so we can delete it." Or, "try to run gprof foo bar baz; it'll run the project and print a profile. dothing is a big fraction of the runtime, optimize it without breaking anything." Or, "where's the code that does the data aggregation". You can ask it to do stuff or you can ask it questions. Hit enter.

It'll churn for a bit. Watch it go (and permit it to run commands if it asks). You'll probably see it move around the directory tree, run grep a whole lot, edit some files, and maybe run some commands. If it tells you it's done, maybe inform it of how to run your project's tests, or how to build it, or whatever. Eventually these kinds of common instructions can go in an AGENTS.md file but you can also just experiment. Eventually when it does look like it might have succeeded (say, you have some reasonable looking output, or it passed the tests, or whatever) go review the changes in git and make a PR if you're happy with them.

What can you use it for?

Lots!!! Over time you'll get the hang of how "big" a task you can expect it to take on and succeed at. Just like humans, they might ace a small refactor and fail a bigger one. Or, if you've got a really tricky algorithm, well, not clear it'll be able to figure it out. But maybe there are variations or generalizations you'd like to make but have been putting off; try it. There's a lot of tedium in programming and it's great at that.

Another great use for the AIs is building infrastructure. Add a parser and a pretty-printer, make it generate charts or JSON dumps, add a debug mode or logging or profiling. That stuff is valuable but annoying to do; with AIs you get the value with a lot less of the annoying. Writing tests, some people get value from.

Another great use is scripts. For years I've had random shell scripts lying around that I use for course admin stuff. Create Github repositories based on a CSV and share them with accounts named in the CSV. Clone every repo and build it. That kind of thing. I got the AI to read all the shell scripts and combine them into a single Python file that's more consistent, has a single readable config file, keeps all the data in nice formats, etc. There's no way in the world I would have done that by hand, but it really is nice.

Wait so does this put us all out of jobs?

No. Ignore the fact that my job is "professor" I don't even have to code for work. And the meetings and so on. Just focusing on coding time—the AIs just aren't that powerful.

Take one of my main research projects, Herbie. It's about a decade old and actively used and maintained. I use the AI extensively when working on it, and the AI really helps. When I started using AI, I ripped through a years-old backlog of refactors, crufty parts to modernize, tests and observability tools to add, and so on. That was a lot of fun.

But then once that's done you've just got the same slow grind of experimenting with different features, trying to find stuff that fits the product and improves results and actually works. In the past I would come up with things worth doing maybe twice as fast as I could do them. Now I can do them quite a bit faster—it varies but refactors and modernization for example are way faster—so I just end up limited by the other thing, coming up with things worth doing / experiments worth running / features worth adding.

So I think my basic experience of programming with AI is just that it's like programming without AI, but with about half the tedium.

Footnotes:

1

This depends on the chat UI; ChatGPT for example seems to be able to run a few languages sandboxed in the background but only in paid accounts. (Maybe?)

2

If that's not you, try Amp Free, which is ad-supported and uses weaker models but mostly does work.

3

It currently uses Anthropic models.

4

They're fine but quite fiddly and I find it's valuable when you're still getting the hang of it to watch what the agent is actually doing. Most people find it significantly increases their confidence in the agent.

5

For example, maybe you want to clean up some of the variable names before you commit or something; that's often easier to just do than to explain to it.