Designing an AI-Native Technical Screen
When I joined Pocus we ran a typical technical screen, two coding interviews back to back. As LLMs became central to how code gets written and understood, it became clear that our screening process was not filtering for the right candidates. Over and over, I'd watch strong candidates clear both rounds and still have no read on whether they could survive in code they didn't write. Everything around us was new and moving fast, so we went back to first principles.
Technical screens have a constraint the rest of the loop doesn't: almost everyone goes through them, so they have to be cheap to run and high signal per minute. That also caps how open-ended they can be. The more a screen depends on interpretation, the less consistent it is across candidates and interviewers, and consistency is most of what a screen is for.
First Principles
When AI writes most of the code, it's worth asking from scratch what changes for the person doing the engineering. A few things kept standing out.
Engineers live in unfamiliar code now
The model makes it cheap to step outside your lane. A backend engineer ships a frontend change, a project needs a one-line fix in a service another team owns, something in a language you've never written needs a small adjustment and you do it anyway. The codebase you're fluent in is a shrinking fraction of the code you touch.
Reading matters more than writing
The model generates faster than any human types, and most of what it produces is plausible. If you can't read code fluently enough to keep pace with what's being generated, you can't keep up with where the work is going. You're just approving things.
Communication stopped being optional
As an industry we spent a long time treating it as the soft thing you could skip if someone could really code. That trade-off doesn't exist anymore. Telling a model what you want in plain language is not a different skill from telling a human.
None of these is what a blank editor measures.
The Interview
Here’s the shape it settled into.
Candidates knew the terms up front: they'd work in a TypeScript codebase, in their own dev environment, and the code would be unfamiliar. The interview lasted 1 hour. At the start we shared a repository, a moderately-sized server that was more than a toy, and walked them through how it behaved when nothing was wrong, plus the scripts to compile and run it. Then they went bug-hunting.
There were three bugs, increasing in difficulty. The first required almost no reading, a warm-up to get them into the repo. The second needed a small, localized change. The third needed real intuition about how the system fit together. None of them asked for a complex feature. They could use a model freely. Ask it about the codebase, about TypeScript, about the error in front of them. The one thing they couldn't do was ask it to find the bug.
The moment you let someone hand the diagnosis to the model, you stop learning whether they can read and reason; you just watch the model work. An unfamiliar repo because that's the actual job now, their own environment because that's the real condition the job runs in, and no asking for the bug because the reading is the point. Run that in an hour, and the onsite is freed up to be the expensive part — actually building something with a model.
Candidate Patterns
We rolled this out in late 2024, and I ran more than a hundred of them myself. A handful of patterns showed up across almost every candidate.
The Skeptic
The interesting thing about the skeptic was never their opinion of AI. It was that their approach wouldn't bend when the problem outgrew it. They'd commit to reading every line by hand, which is admirable until the third bug, when the strategy that got them through the first two simply stops scaling and they keep running it anyway. Very few of them reached the end.
The Executor
The executor used the model the way a harness uses a model: asking it for a fix, pasting the fix in, running it, and asking again when it didn't work. The trouble is that a harness usually has a way to verify the result, and this one didn't. They became the model's hands without being its judgment, churning through suggestions they couldn't evaluate, getting further from the bug with every confident wrong turn.
The Deflector
Some candidates spent the interview litigating the interview itself. The question was unrealistic, the bugs were contrived, real work doesn't look like this, and here are the reasons, delivered at length, in the time that could have been spent finding the bug. Attacking the question instead of the problem is itself the signal.
The Leetcoder
This was a surprisingly large number of people who had drilled the traditional loop so thoroughly that anything slightly out of band knocked them over. Strong on a clean algorithmic prompt, lost the moment the problem had the texture of real software.
The Collaborator
The people who did well looked different and looked alike. They got nerd-sniped: the bug became a thing they needed to solve for its own sake, which is one of the most reliable engineer traits I know. They reasoned from first principles. They moved through strange code without needing permission to be there. In many ways, they used the model as a collaborator instead of being deferential or controlling. The people who did well here tended to do well at the onsite and after they joined, a correlation I felt across the reps rather than measured, but a strong one.
The final stage
There was a final stage almost nobody reached, and that was by design. If a candidate cleared all three bugs, the last task was to file a support ticket for a bug in a service whose code they couldn’t see at all. No repo, no reading their way to the answer, just reasoning about a black box from its behavior, and communicating clearly enough that someone on the other side could act on it. It was the third signal, communication, isolated from everything else and stood up on its own.
Very few candidates got there, so I have less to say about it than I’d like. I haven’t run this interview in over four months, and the models have moved meaningfully in that window. If I were building it today, I think I’d put far more of the weight right here, on the part I barely got to test.
Today
This interview isn’t perfect. It rewards engineers who are fast, exploratory, and comfortable in chaos, and under-rates the careful, deliberate ones who are unremarkable in a sixty-minute sprint but excellent over a quarter. A timed bug hunt is a particular lens, not a universal one, which makes it a better positive signal than a negative one: someone who did well almost certainly had what we were looking for, while someone who didn’t might just be slow to warm to a stranger’s code. I almost certainly passed on people I shouldn’t have, and the reason I could live with that is that this was one of two screens, not the whole bar. A screen is allowed false negatives when it isn’t the only gate.
The patterns that matter have moved, living in unfamiliar code, reading faster than you write, working with a model instead of around it, communicating clearly under pressure. Most screens were built before any of that was true and still measure as if it weren't. We caught the shift early enough to build an interview that screens for it. The interview will keep changing. Noticing what to screen for is the part that lasts.
