Why Linux Troubleshooting Advice Sucks
A short post on how to create better troubleshooting documentation, prompted by me spending last evening trying to get builtin display of my laptop working with Linux.
What finally fixed the blank screen for me was this advice from NixOS wiki:
While this particular approach worked, in contrast to a dozen different ones I tried before, I think it shares a very common flaw, which is endemic to troubleshooting documentation. Can you spot it?
The advice tells you the remedy (“add this kernel parameter”), but it doesn’t explain how to verify that this indeed is the problem. That is, if the potential problem is a not loaded kernel driver, it would really help me to know how to check which kernel driver is in use, so that I can do both:
-
Before adding the parameter, check that
46a6
doesn’t have a driver -
After the fix, verify that
i915
is indeed used.
If a “fix” doesn’t come with a linked “diagnostic”, a very common outcome is:
- Apply some random fix from the Internet
- Observe that the final problem (blank screen) isn’t fixed
-
Wonder which of the two is the case:
- the fix is not relevant for the problem,
- the fix is relevant, but is applied wrong.
So, call to action: if you are writing any kind of documentation, before explaining how to fix the problem, teach the user how to diagnose it.
When helping with git
, start with explaining git log
and git status
, not with git reset
or git reflog
.
While the post might come as just a tiny bit angry, I want to explicitly mention that I am eternally grateful to all the people who write any kind of docs for using Linux on desktop. I’ve been running it for more than 10 years at this point, and I am still completely clueless as to how debug issues from the first principles. If not for all of the wikis, stackoverflows and random forum posts out there, I wouldn’t be able to use the OS, so thank you all!