How to Start AI Coding
On the one hand this is the most hyped topic ever so you'd think I have nothing to add. I don't feel that I do. But I'm known as the "AI Coding hype man" in my department and among many friends. They keep asking me how they're supposed to do AI coding. This blog post is an attempt to save time explaining.
If you are happy with your AI coding setup, please just skip this blog post. This is an opinionated guide to AI coding. It is optimized toward taking skeptics and getting them to try it; I don't claim this is the "right way" to do AI coding, just one that I've found convinces fence sitters. The opinionatedness is part of that, it cuts through confusion by purposefully oversimplifying.
Types of AI Coding
Autocomplete: One way to do AI coding is to use Github Copilot or Cursor Tab or one of these other features where there are AI-generated auto-completions in your text editor.
The big challenge is that auto-complete has absolutely brutal latency requirements, tens of milliseconds at most. You just can't run one of the big modern smart models in that time, nor can you make a network request, so you're stuck with small dumb local models.
So basically, it's fast but dumb. I guess that's to taste. I turn it
off, but some friends like it. It's great at adding print statements
when debugging; it's less useful for logic and can make subtle
mistakes. It might be more useful in common and more boiler-plate-y
languages.
Chat: Another option is to copy and paste code into the ChatGPT UI. You may laugh but 1) a lot of people do this, and 2) it works better than auto-complete. Here you are using the large modern smart models and can get very complex code written to reasonable quality. You can do this with any chatbot: GPT, Claude, Gemini, and so on.
The big challenge is that, in a chat interface, the AI has the ability to write code but minimal ability to run it.1 [1 This depends on the chat UI; ChatGPT for example seems to be able to run a few languages sandboxed in the background but only in paid accounts. (Maybe?)]
I think of it as "whiteboard coding". You ever do an interview somewhere like Google, and they asked you to write A* on the whiteboard? You probably did a decent job but that code might not actually run if you transcribe it literally. Just like you would do better if you could actually run the code, read error messages, and so on—well, same for the AI.
Agents: An AI "agent" is just a for loop. There are two parts to the loop: the AI model and the "agent harness" that it talks to.
In each iteration of the loop, the AI generates a "tool call" to one of a number of tools it has access to. More specifically, it generates a token naming the tool and then more tokens describing the arguments to the tool, usually in JSON format. Common tools are things like "edit file" (arguments: a file name and a patch file) and "run command" (argument: a shell command). Then the "agent harness" executes the tool, produces output, and that output becomes part of the AI's context when it starts the next iteration of the loop.
Agents work well because they can mostly develop code the way people
(Unix greybeards) develop code: by writing code, compiling it, running
it, observing the output, and making adjustments. Modern AIs are
surprisingly capable with the shell, which gives them a lot of
capabilities right off the bat (they can grep through code bases,
examine output files, tweak compiler flags, run Make, analyze JSON
files with jq or just by writing Python scripts, and so on). But the
AI can also do stuff like add print statements, run the code, look at
what was printed, and repeat. Just like this helps you develop—it's
easier to add print statements than accurately simulate program
execution in your head—it also helps the AI.
Anyway, across these three options, you want the good shit, you want an AI agent.
Choosing an AI agent
First, pick a vendor. Pay them money. If you've finished college, can program, and are in the US, you probably have $20.2 [2 If that's not you, try Amp Free, which is ad-supported and uses weaker models but mostly does work.] Just pay them the money. This is a huge hang-up people have but get over it.
You probably know all the vendors' names, but just to put them in one place:
- Anthropic makes Claude Code
- Google makes Gemini CLI
- OpenAI makes Codex CLI
They're all fine. People have preferences, and sometimes one or the other pokes ahead a bit, but you're not missing out on much by picking the "wrong" one. I use one called Amp,3 [3 It currently uses Anthropic models.] which I do like, but I wouldn't be hard put upon if I had to switch. I often use Codex in demos because my university offers OpenAI.
Note that you want the CLI version. Some of these also have "Cloud" versions; avoid.4 [4 They're fine but quite fiddly and I find it's valuable when you're still getting the hang of it to watch what the agent is actually doing. Most people find it significantly increases their confidence in the agent.] The CLI version can run commands on your code base, directly on your computer, and directly observe the results. It's the closest match to how you work and also it's the easiest to collaborate with.5 [5 For example, maybe you want to clean up some of the variable names before you commit or something; that's often easier to just do than to explain to it.] You can also use these tools from VS Code, by the way, they work just as well there. The important thing is that you're using 1) a full-fat frontier coding model, and 2) it runs on your machine. To be clear, the model doesn't run on your machine (your GPU is too puny). The agent harness runs on your machine and can interact with it. That's what matters.
Is running the agent on your computer safe? What if it executes bad
commands? Is it a security risk? Nah. I mean, in theory, I don't know,
but I've never had it do something bad. Or, just try it! Run one of
these agents and try to convince it to run rm -rf /. Maybe it's
because I use the Anthropic models but I haven't succeeded. Also the
agent harness will ask your permission before running commands that
aren't whitelisted. I usually turn this off, the models are safe, but
it's there too.
What about privacy? Will it exfiltrate your precious data? Ok, first of all, you probably work on open source stuff anyway, or if not open source probably not national secrets at least. Second, some of the paid versions promise not to train on your data, whatever that's worth. Would local models be better? In theory, but in practice they're not very good.
Using it
Ok, so now you've paid your vendor and downloaded the CLI version. Let's get started using it.
Pick a coding project you work on. They're great at everything. Front-end, back-end, C++, Racket, Haskell, data analysis, etc. Don't worry about it. Any language, any goal, whatever. Pick a coding project.
Make sure you have the code in git or similar and make sure you have a
clean checkout. While the agents are smart they do sometimes make
ill-considered changes, and you'll want to review the changes and
possibly roll them back. Some people make a separate copy of their
repo for the AI agent to play in; you can do that too, but make sure
the repo is ready for work, like, it's compiled and the tests run and
so on. For a first attempt, easier to just use where you work, because
you know that that's working.
Now start the agent, usually by opening your terminal, changing to
your project's directory, and typing the name of the tool, like codex
or claude or gemini. Do the login flow. Now you should see a box that
you can type into.
Type into it and tell it what to do. "FooBarFactory seems excessive,
see if you can refactor the code so we can delete it." Or, "try to run
gprof foo bar baz; it'll run the project and print a profile. dothing
is a big fraction of the runtime, optimize it without breaking
anything." Or, "where's the code that does the data aggregation". You
can ask it to do stuff or you can ask it questions. Hit enter.
It'll churn for a bit. Watch it go (and permit it to run commands if
it asks). You'll probably see it move around the directory tree, run
grep a whole lot, edit some files, and maybe run some commands. If it
tells you it's done, maybe inform it of how to run your project's
tests, or how to build it, or whatever. Eventually these kinds of
common instructions can go in an AGENTS.md file but you can also just
experiment. Eventually when it does look like it might have succeeded
(say, you have some reasonable looking output, or it passed the tests,
or whatever) go review the changes in git and make a PR if you're
happy with them.
What can you use it for?
Lots!!! Over time you'll get the hang of how "big" a task you can expect it to take on and succeed at. Just like humans, they might ace a small refactor and fail a bigger one. Or, if you've got a really tricky algorithm, well, not clear it'll be able to figure it out. But maybe there are variations or generalizations you'd like to make but have been putting off; try it. There's a lot of tedium in programming and it's great at that.
Another great use for the AIs is building infrastructure. Add a parser and a pretty-printer, make it generate charts or JSON dumps, add a debug mode or logging or profiling. That stuff is valuable but annoying to do; with AIs you get the value with a lot less of the annoying. Writing tests, some people get value from.
Another great use is scripts. For years I've had random shell scripts lying around that I use for course admin stuff. Create Github repositories based on a CSV and share them with accounts named in the CSV. Clone every repo and build it. That kind of thing. I got the AI to read all the shell scripts and combine them into a single Python file that's more consistent, has a single readable config file, keeps all the data in nice formats, etc. There's no way in the world I would have done that by hand, but it really is nice.
Wait so does this put us all out of jobs?
No. Ignore the fact that my job is "professor" I don't even have to code for work. And the meetings and so on. Just focusing on coding time—the AIs just aren't that powerful.
Take one of my main research projects, Herbie. It's about a decade old and actively used and maintained. I use the AI extensively when working on it, and the AI really helps. When I started using AI, I ripped through a years-old backlog of refactors, crufty parts to modernize, tests and observability tools to add, and so on. That was a lot of fun.
But then once that's done you've just got the same slow grind of experimenting with different features, trying to find stuff that fits the product and improves results and actually works. In the past I would come up with things worth doing maybe twice as fast as I could do them. Now I can do them quite a bit faster—it varies but refactors and modernization for example are way faster—so I just end up limited by the other thing, coming up with things worth doing / experiments worth running / features worth adding.
So I think my basic experience of programming with AI is just that it's like programming without AI, but with about half the tedium.
Footnotes:
This depends on the chat UI; ChatGPT for example seems to be able to run a few languages sandboxed in the background but only in paid accounts. (Maybe?)
If that's not you, try Amp Free, which is ad-supported and uses weaker models but mostly does work.
It currently uses Anthropic models.
They're fine but quite fiddly and I find it's valuable when you're still getting the hang of it to watch what the agent is actually doing. Most people find it significantly increases their confidence in the agent.
For example, maybe you want to clean up some of the variable names before you commit or something; that's often easier to just do than to explain to it.
