BSOD
In Windows, the blue screen of death is what is shown to users when the operating system encounters something it wasn't expecting so bad that it decides to just quit rather than attempting to continue operating.
This isn't a single task that has failed, like Windows Explorer. If that dies you lose the ability to interact with windows, but you can restart the process using keyboard shortcuts.
The behavior
My Windows 11 desktop had always been a little flaky, occasionally hanging every few weeks. It was annoying, but not enough to actually go through the effort to diagnose or re-install everything. But after a recent patch Tuesday, the behavior had changed. Rather than hanging, it started displaying a BSOD, rebooting and then running just fine.
I also have three different accounts on my desktop:
- My standard, low permission account I use for day to day activities and development, games, etc.
- An Administrator account that is linked to my Microsoft account
- An Administrator account that is not linked to anything
I have these accounts because it has saved me in the past multiple times. If the auto-start of one of the accounts is causing issues you can use a different account to research what is going on, rather than being stuck in a reboot loop.
The analysis
Changing from a hang to a BSOD was a vast improvement. One great thing about a BSOD is that Windows, if it can, will dump the OS memory (and possibly the application memory too) to a file that you can examine.
WinDbg to the rescue
Windows allows you look at the memory dump, when I loaded up the memory dump here is what I saw:
What stood out to me is that the kernel was freaking out while accessing and trying to lock a USB drive. Now I had some older USB drives attached to my PC (all of them much older than the 3 year old Windows machine), so I had a few initial thoughts (non of them mutually exclusive):
- Since I hadn't updated the firmware on any of them, it's possible that they were hitting a bug and giving Windows a response it wasn't ready for.
- It's possible that they were just failing.
- It's possible that a Windows update had made the OS more picky about what USB behavior it would accept.
Regardless of the reason, the first step was to disconnect the USB drives.
Success!
Bingo, after removing the drives the system became stable again (it's been running over a week without issue).
Lessons learned:
- If you OS is giving you error messages, it probably not going to get better if you ignore them.
- Microsoft has some really good free tools that allow you to delve into the operation of your PC.
- Computer components do wear out and you shouldn't discount the need to prepare for failures.
- I'm glad I have backups :-)
Comments
Post a Comment