Problem reports
Table of Contents
Production incidents
In the spirit of continuous improvement, when there are issues in Production, write a post-mortem of the event with root cause details:
Minimally it needs to include:
- A summary of the issue
- The time/dates of events related to the issue (first notified of issue, root cause identified, issue mitigated, fix delivered to prod)
- Business impact of the issue
- Engineering Root Cause
- Identification / Issue signature (what does this look like in logs, behavior, etc?)
- Resolution
- Next Steps
- Learnings and Mitigation (how have we changed our processes to prevent the issue in the future?)
- Link to IMOP incident
- Link to Jira issue for defect
Reading material
https://microservices.io/post/microservices/2022/01/04/writing-better-problem-reports.html