How I Think About Research Projects

Now that I have advised a few research projects, I've begun to notice similarities, common errors, and inflection points. For the benefit of my students, here's a quick summary of how I approach projects. Naturally, this is specific to my research, and might not apply to any other advisor.

Projects

I divide my students' research into "projects". Usually a project begins with some vague idea, like "balance multiple garbage collectors" or "synthesize math function implementations" or "optimize rendering commands" or something like that. A project ends when it publishes a top-tier paper somewhere like PLDI, POPL, OOPSLA, SC, ARITH, or similar. This should take about a year, though of course research is uncertain. Maybe after that there's a follow-up project.

Each project is owned by one student. Other PhD students, MS students, undergrads, faculty, whatever may work on it—but that one owner will be first author and lead the work, the evaluation, the writing, and give the presentation. Generally a student focuses on one project at a time, though they might still be, say, working on implementation or deployment of a previous one, or brainstorming a future one.

A student should have something like 3-5 successful projects over the course of their PhD. Publishing a top-tier paper is definitely success, but there are less orthodox forms of success. For example, I think of deployment at a company like Mozilla success. There are likewise many kinds of failure, discussed below.

Projects proceed in four phases, described below. Of course you don't strictly do one after another. The boundaries are vague. And it's good to plan ahead a bit—you want to have a working paper story in place while you're evaluating, for example. But I find the phases helpful for diagnosing issues in a research project, especially issues of why a project feels like it's not making progress.

For each phase, I focus on how it can fail and why students get stuck there, because ultimately not getting stuck is the key skill one learns in a PhD. Typically it's because of an unwillingness to move from one phase to the next. Sometimes it's because that phase is a natural fit for the student; for example, some students just like writing cool software and can get stuck doing that. Coding is fun but you can't code your way out of a lack of evaluation—so it's best to keep this a side project, collaboration, or slow-burn maintenance task. (This is what I do with Herbie.)

Phase 1: exploration

Projects start with a vague idea, and the first phase is about trying to make the project something concrete. For example, perhaps the idea is synthesizing math function implementations. Then this phase involves reading some math function implementations, or some papers about synthesis, and trying to come up with some ideas for how one might go about synthesizing math function implementations. It should also plan out, in general terms, why the work might matter, how one might find impressive applications, what metrics you might evaluate it on, and so on.

This phase can fail when the project ends up hard or impossible. In grad school, I had this ideas of a program logic for shell scripts. The program logic assertions would reason about files existing or not existing, and you could generate weakest preconditions for a complicated shell script and then identify false preconditions like "the file must not have spaces in the file name". I still think it's a cool idea, but during the exploration phase I downloaded every shell script on Github (about 12M at the time), ran some general statistics on them, and determined that common commands (mv, rm, file, and so on) make up a tiny minority of shell commands, so doing this meant writing specs for hundreds of commands. I didn't want to do that, so I abandoned the project.

This phase can also fail when you make a plan, but it's not very good. Once you start building your idea (Phase 2) you discover that you can't or won't finish it. So moving on from this phase too early is bad, and it takes some skill and experience to know whether an idea is good enough. In grad school, I had an idea I called "deep synthesis" which generalized DPLL/T with several domain-specific solvers were hooked together and passing sketches / counterexamples back and forth. I got it kind of working for synthesizing CSS stylesheets, with separate solvers for CSS values and CSS selectors, but I never got it to scale acceptably. It took a few months to accept that it didn't work.

One can also get stuck in this phase, especially students who like reading and brainstorming. This phase is all about reading papers, learning stuff, and build toy prototypes. But if you keep at it too long, the project settles into a kind of groundlessness: there's all these ideas, all these things you could do, but nothing is actually being built. Time is wasted thinking about problems that may not occur in practice. It is necessary to move on. An idea doesn't need to be perfect before we move on to making it real.

Phase 2: development

This second phase is about trying to prove that the idea could work. This typically means building a prototype. I say this phase is about "something that does something": the prototype can be bad, it might only work in some cases, it might have bugs, but there have to exist inputs where it does something impressive.

This phase can fail when the plan from phase 1 is not very good, or just not the right plan. When you've got the wrong plan, you go back to the drawing board to make a new plan or you abandon the project. For example, in MemBalancer, Marisa and I spent way too long chasing this idea that we could assign each heap a "balance factor" and then try to adjust garbage collection frequency until each heap had the same balance factor. We constantly had either steadily growing memory usage or steadily shrinking memory usage. This was the wrong idea; the right idea, which we eventually figured out, is about splitting a fixed pool of memory across multiple heaps.

Sometimes students leave this phase too early due to deadline pressure. It can be the advisor's fault. For example, in OpTuner, I pushed for submitting to POPL before the project was ready. This ultimately meant time wasted doing the wrong evaluation and polishing the wrong paper. It took another six months before OpTuner was ready to move on to the next phase. A counterpoint was Herbie's story, where Zach stopped us from submitting a paper too early and we went back to improve the tool futher.

One can get stuck here, especially students who like to hack. Often research software is interesting! I got stuck here for almost a year with Cassius, writing a web browser layout engine for fun. But researchers aren't here to build products but to develop and evaluate ideas. If you stay in this phase too long, the project feels aimless: we're implementing feature after feature but it's not clear why, and the features seem chosen arbitrarily.

Phase 3: integration

The third phase is about proving the idea by evaluating it. Typically this is a set of experiments comparing the prototype to a baseline. At first, the prototype is usually worse, so this phase is about rapidly iterating between experiments and software. The end state should be certainty about the prototype and how well it works.

This phase can fail if the main leverage over your metric is something other than what the project is focused on. For example, the first two Cassius papers were about formalizing browser layout, but by the third Cassius paper I wanted to focus on rely/guarantee reasoning and yet I was constantly fixing bugs in my formalization (and if you read the paper, it mentions a couple of things we had to assume, which we spun as a strength of the approach). Performance metrics are often like this, especially if you're trying to compare to some industry tool like an SMT solver that has had years of performance tuning. Low-level performance tuning can net small integer multiple speedups, and that might matter more than whatever your research idea is doing.

This phase can also fail if you don't know how to evaluate the work. You need a metric be mainly determined by whatever you did, not something incidental. For example, in Herbgrind, we were trying to produce better bug reports than alternative tools, but were evaluating it on overhead. No surprise: doing a more detailed analysis took more time! You have change the metric and devise new experiments.

One can get stuck here trying to make the results perfect. Software is hard to evaluate, and no proof is perfectly convincing. This can feel very uncertain and insecure. I try to remind students that the goal isn't to prove that their software is good—it is to prove some idea embodied by the software. Say a paper says so and so kind of user would use the research prototype. That's good writing, but it's not essential that the prototype actually be a joy to use! In a decade of Herbie work, the most important improvements for users have been UI improvements. But that wasn't even mentioned in the paper!

Phase 4: writing

Now that you've got a prototype and have proven that it works, you write a paper. This means you have to tell a story about the problem that this research addresses. At the highest level, the key to writing a good paper is to have the pieces interlock. You need the problems described in the intro to feel important, but you also need the technical sections to address those problems in some direct way, and you need your evaluation to test against those problems. Some stories are easy to write—some task is slow, we make it faster, here's are cool algorithm, evaluation shows we are 3x faster. Other stories are harder to write: if you developed a DSL, who will use it, what benefits does it provide, how do we know if those benefits are achieved?

Obviously the easiest way to fail this phase is to be a bad writer. But a more subtle way is to write a fine paper, technically and grammatically and scientifically correct, but just not particularly exciting to any reviewers at any venue you can find. This happened with Rival—I'm still looking for a venue. This can be maddening to fix, because reviewers might just flatly misunderstand your work and give you useless feedback, or they might understand your work but not care about it, and give you reviews of basically "meh". To fix this in a resubmission, you need to take reviews seriously but not literally. You, the author, understand your paper and your work best.

This phase ends with a deadline, so one can't get stuck here, but all that makes planning important. I budget a month for writing, or two months if it's the student's first paper, in which case it's overlapped with phase 3. This gives plenty of time for revisions, especially important since a lot of my work (numerics, browsers) is less mainstream and requires a clearer story. I say that if you try to write a paper in a week, you may find yourself write it in month, split over four revisions. In my experience, better papers get more helpful, less frustrating reviews.

That said, if you mess up the planning, you might come up with a story before you know if you can achieve those results. Then you'll have to rewrite everything in a hurry, or abandon the deadline. Or, even worse, you might realize you pulled results from different runs and actually your results are fake and unreproducible. That would be a disaster and is important to avoid by avoiding unrealistic deadlines.

After the project

Once the paper is accepted, you have to make a good presentation for the conference. Hopefully, the paper describes the work concisely and motivates it well, but in most cases I've found that presentations and elevator pitches need to be much clearer, crisper, and shorter than what the paper said, and besides that the paper was way too defensive in its writing.

It's also a good idea to brainstorm: who in the world would be interested in this work, what applications it might have, and what follow-on collaborations would you want to have. Reaching out to folks pre-conference will make the conference itself more useful for you. Plus, talking to people will refine your pitch into something clear and crisp that you can put into the presentation.

It's also worht asking if you plan to abandon the prototype, or maintain it. A prototype can become a platform for futher work, or a tool that people in industry use, or a demo that you use in the classroom, or any of the above. That can scratch the software development itch for some students, while they move on to further research projects; it's what I did with Herbie. But all of this carries costs, especially for students who are worse at multitasking. Maintenance is a many-year process, and wasting six months only to give up is not worth it. I don't maintain Cassius now, for example, and feel like that's the right answer.

I also make sure to have a "debrief" with students after each project. What went well and what went poorly. What questions should we have asked ourselves earlier? Did we spend too long in some phase, or not long enough? Reflecting develops higher-level research skills and gives students them an intuitive feeling for the kind of mistakes they can make in leading a research project. Eventually, they will advise their own students, they'll need to reflect on their own experiences to help their students avoid problems.

By Pavel Panchekha

09 March 2023