When building systems that require monitoring and alerting, a false alarm should be treated as a bug. If it’s not resolved quickly, it diminishes the usefulness of the alerts and trains responders to ignore it. When alerts are ignored, the potential for an actual incident increases along with the severity due to time to detection.
Thinking about false alarms as bugs is a handy idiom for defining the problem and the solution in a way that is easy teams to understand.
From my collegue, Eric Berglund.
See also:
- Analgous to “there is no software maintenance”, there are no false alarms
- Closing the gap between previous technical decisions and the current quality bar
- Talk then code