Incident Management in an Agile Setup
Last updated
Agile systems and adoption of DevOps has posed new challenges for ITSM teams. Agile development, with it’s focus on continuous deployment, often poses a challenge to the ITSM teams. Incident management teams rely on quality development, documentation and release planning to do their job well.
Here are four areas of traditional ITSM that can be adapted to make DevOps successful:
1. Service Objectives
Because of the role they play. incident management teams focus heavily on Service-oriented goals and metrics. In the chase for faster, more frequent deployment, these objectives can often get lost by DevOps teams. Ultimately, a service is only good if it is functional and meets its objectives. Agile teams must set the right objectives, and set up monitoring and alerting to keep the right focus on service delivery.
2. Change Management
Change management seems antithetical to the agile philosophy. Submitting detailed change requests and waiting for reviews and approval slows things down. But without proper handling of change management, incident respondents aren’t aware of undocumented changes, the service desk and other stakeholders are out of the loop, and preventable issues slip through. However, with a few steps, change management can be made agile too:
- Pre-approve low-risk changes and empower SREs to automate them or run them in response to an incident.
- Automate the documentation of changes - integrate your tools so that ticketing and service desk software is updated automatically when changes occur.
- Modernize the Change Advisory Board and replace it with a chat channel instead of a meeting. Again, integrate your tools so that Slack or Teams is updated with change details automatically.
3. Incident Management Workflows
Incident response processes need to be well documented. Otherwise, teams tend to get bogged down in firefighting. Respondents jump from one alert to the next trying to keep up. Being on-call is strenuous enough, but it can get even more stressful when every event is raised as an alert with no oversight on priority. Events are often alerts from monitoring tools that make you aware of something important. Whether or not constitutes an incident depends on whether a service is being disrupted or degraded in performance. On-call engineers should have a playbook for how to respond to different types of incidents, ensuring consistent, repeatable responses no matter who is on call.
4. Stakeholder Communication
Getting the right stakeholders involved and keeping them in the loop through planning, deploying, and operating services is key to maintaining a great service. Integrating your DevOps tools is key to keeping your stakeholders involved. For every new release, your deployment tools should automatically update service desk tools so that your first line respondents know about all changes. During incidents, your alerting and incident management tools should automatically update status pages, ticketing systems, and other stakeholder communications.
Alka Gupta
Lover of all things organic - digitally and otherwise! Founding team @Zenduty. Taekwondo Black Belt, Potter, and a coffee connoisseur.