In this essay to CS grad students, Kristopher Micinski, at the end makes an interesting observation -
there are plenty of domains where Claude Code completely fails right now–paren matching in Racket is only one example.
I had noticed this exact phenomenon so many times; when it would struggle with balancing parentheses and get stuck for quite a while. It baffled me. After all, how hard is balancing parentheses?
When I posed this question to Gemini, this is response I got -
The Issue: Most training data (GitHub, etc.) contains code with shallow nesting (3–5 levels).
The Result: If a Racket function requires 8 or 10 levels of nesting, the model enters "out-of-distribution" territory.
Also, they don't use stacks and don't have a reliable way to count.
Hopefully, going forward, projects like Calva Backseat Driver will solve this problem.