Always Be Blaming
A few tips on 4D-ing your code comprehension skills.
I wrote on the importance of reading code before: Look Out For Bugs My default approach to reading is “predictive”: I don’t actually read the code line by line. Rather, I try to understand the problem that it wants to solve, then imagine my own solution, and read the “diff” between what I have in my mind and what I see in the editor. Non-empty “diff” signifies either a bug in my understanding, or an opportunity to improve the code.
This is 2D reading, understanding a snapshot of code, frozen in time. This is usually enough to spot “this feels odd” anomalies, worthy of further investigation.
Ideal code is memoryless — it precisely solves the problem at hand.
Most real code is Markov — the shape of the code at time T depends not only on the problem statement, but also on the
shape of the code at time T - 1. The 3D step is to trace
the evolution of code over time,
Where Do We Come From? What Are We? Where Are We Going?.
The step after that is to understand the why. What were we thinking back then, when we wrote this code? It’s useful to have the “theory of mind” concept ready here. I personally learned the term way too late in my life, so let me give a short intro for today’s lucky 10 000. Theory of mind is the ability to imagine yourself in someone else’s skin. Not just in their shoes (“I certainly would have acted differently in that situation”), but with their mind (“I wouldn’t have acted that way, but I get why they did”). This is something people learn. The experimental setup here is to have a child in a room with toys, with a doll sitting near the opposite end of the room, and asking the child “what does the doll see?”. Younger children describe the room from their perspective, older begin to intuit that doll’s perspective is different.
So this is the goal of reading code — understanding what the original author was thinking, and why.
End of the mumbo-jumbo, some practical advice. First, read Every line of code is always documented, it is very good.
Second, make sure it is effortless for you to find out how a
given snippet of code evolved. This is harder than it seems! Just
git blame isn’t an answer — mind the gap between the
problem that’s easy to solve, and the problem in need of solving.
git blame answers spatial question of “how each line
appeared in this file”, because there’s a relatively
straightforward UI for this — annotate each line with a commit hash.
But this is not the question you are asking most of the time! You
don’t care about the file! There’s a small snippet of code in the
middle, and you want a temporal history of that.
As much as I don’t like working in the browser GitHub’s web interface for blaming is probably better than what you get locally by default. It starts with the y shortcut, which resolves a symbolic reference like
https://github.com/tigerbeetle/tigerbeetle/blob/main/src/vsr/replica.zig
into the one which has a commit hash in the URL:
https://github.com/tigerbeetle/tigerbeetle/blob/c54f613a2eb2a127a0ba212704e3fa988c42e5cb/src/vsr/replica.zig
This commit hash is critical, because it anchors the entire repository — if you open a different file from the web UI, it will be shown as of that commit. This enables you to not myopically focus on just the diff in question, but to absorb the entire context at that point in time.
So my usual web workflow is:
- ctrl+f to find the line I am interested in
- b to toggle blame
- Click “blame prior to change” a couple of times, repeating ctrl+f to go back to the snippet I am curious about.
- cmd-click on the commits that are potentially relevant, pinning their commit hashes in the URL in new tabs.
-
Then, from the commit page, “Browse files” button to then go and
t to other files. Or,
cmd+l to focus browser’s address
bar, and
s/commit/tree/(or back!) as needed, to switch between diff and snapshot views.
Again, my goal here is not to annotate a diff on a file but rather to get a “virtual checkout” as of the interesting commit.
This web approach is what I was using throughout most of my career,
but I’ve finally found a way to replicate it locally. The idea is to
make blaming “in-place”. Instead of git blame
annotating lines of code, I directly switch to a historical commit. I
have the following
devil hydra of shortcuts:
, b l blames line. It notes the $line the cursor is at, runs
git blame -L $line,$line
to find $commit that introduced the line, and then runs
git switch --detach $commit
to check it out. I have
a dedicated worktree for code archeology, so I don’t worry
about trashing my work. There’s also a half-hearted attempt to
maintain “logical” cursor position, but it doesn’t work very
well. Is there some git command that tells me directly “what’s the
equivalent of $file:$line:column in $sha-A
for $sha-B?”
, b p blames parent. Which is just switching to
the parent commit of the current HEAD, what “blame
before this change” does on GitHub (it works slightly differently
because it assumes that
, b l was the previous command)
, b u undoes the last blaming operation,
switching to the previous point. I really love that, on the
web, I can cmd-click to create an alternative
branch of exploration. In theory, this is replicatable locally, but I
prefer to destructively mutate a single working tree on disk. A big
reason for preferring in-place blame is that LSP, ./zig/zig
build test, rg and the like just work. That’s
more important for me than the garden of forking paths, and undo is an
acceptable work-around.
Finally, , b w copies GitHub link to the current commit and line, which I can paste into the browser. An enormous problem with modern version control landscape is that absolutely critical information in the form of code review comments is not a part of the git repository, and is locked in someone else’s proprietary database. I failed to solve this problem in one weekend, and had to begrudgingly adapt. Opening the commit in a browser links you to the PR and its discussion as well.
Implementing this blame workflow required a bit of custom code. Feel free to use it, but beware that it’s somewhat crufty, especially around maintaining current cursor position. Making a production-ready version of this sounds like a fun project ;-)