I once “biased for action” and removed some “unused” NS records to “fix” a flakey DNS resolution issue without telling anyone on a Friday afternoon before going out to dinner with family.
Turns out my fix did not work and those DNS records were actually important. Checked on the website halfway into the meal and freaked the fuck out once I realized the site went from resolving 90% of the time to not resolving at all. The worst part was when I finally got the guts to report I messed up on the group channel, DNS was somehow still resolving for both our internal monitoring and for everyone else who tried manually. My issue got shoo-shoo’d away, and I was left there not even sure of what to do next.
I spent the rest of my time on my phone, refreshing the website and resolving domain names in an online Dig tool over and over again, anxiety growing, knowing I couldn’t do anything to fix my “fix” while I was outside.
Once I came home I ended up reversing everything I did which seemed to bring it back to the original flakey state. Learned the value of SOPs and taking things slow after that (and also to not screw with DNS).
If this story has a happy ending, it’s that we did eventually fix the flakey DNS issue later, going through a more rigorous review this time. On the other hand, how and why I, a junior at the time, became the de facto owner of an entire product’s DNS infra remains a big mystery to me.
I once “biased for action” and removed some “unused” NS records to “fix” a flakey DNS resolution issue without telling anyone on a Friday afternoon before going out to dinner with family.
Turns out my fix did not work and those DNS records were actually important. Checked on the website halfway into the meal and freaked the fuck out once I realized the site went from resolving 90% of the time to not resolving at all. The worst part was when I finally got the guts to report I messed up on the group channel, DNS was somehow still resolving for both our internal monitoring and for everyone else who tried manually. My issue got shoo-shoo’d away, and I was left there not even sure of what to do next.
I spent the rest of my time on my phone, refreshing the website and resolving domain names in an online Dig tool over and over again, anxiety growing, knowing I couldn’t do anything to fix my “fix” while I was outside.
Once I came home I ended up reversing everything I did which seemed to bring it back to the original flakey state. Learned the value of SOPs and taking things slow after that (and also to not screw with DNS).
If this story has a happy ending, it’s that we did eventually fix the flakey DNS issue later, going through a more rigorous review this time. On the other hand, how and why I, a junior at the time, became the de facto owner of an entire product’s DNS infra remains a big mystery to me.
Hopefully you learned a rule I try to live by despite not listing it: “no significant changes on Friday, no changes at all on Friday afternoon”.
"Man who deployed Friday, works Saturday. "