Time to restore service is a DORA metric that shows the time it takes to restore service incidents that impact the user. Use this metric to help monitor the stability of your service and platform.
Which reports use Time to restore service?
Time to restore service information is available in Home, Retrospective, and Team health insights.
Note: Time to restore service requires an incident configuration. If you don’t configure incidents, you won’t see Time to restore service data.
Home
Customize your selected metrics to include Time to restore service in Home. Use this report as a high-level summary of your team’s average Time to restore service. Learn more about Home.
Retrospective report
Details about Time to restore service are available in the Retrospective report.
Managers can use this report for a detailed analysis of Time to restore service and answer the following questions:
- What incidents had to be restored?
- Which incidents took the longest to restore?
Team health insights
Team health insights provides a team-level view of Time to restore service. Use it to see how your teams and nested teams are doing across a certain time period.
What does Time to restore service measure?
Time to restore service measures the time between an incident occurring and the incident being resolved.
To help minimize restore times:
- Use rollback procedures.
- Integrate feature flags and test new features by toggling them on for different groups of users at a time.
How is Time to restore service calculated?
Time to restore service = Total time spent on incidents/number of incidents
Flow uses incident creation time and incident resolution time to calculate an average Time to restore service for the user’s set date range on Home. Incidents are defined by a user’s custom configurations. Learn more about incident configurations.
Incidents are attributed to a team if anyone on the team was ever assigned to the ticket.
Flow uses ticket data from your integration to track when an incident occurs. Flow uses the ingested ticket time for when the ticket was opened to determine when a ticket was opened or closed.
What settings can affect Time to restore service?
The following settings can affect how Flow calculates Time to restore service:
- Incident configurations
- If a project is imported or not
- Date range
How do date ranges affect Time to restore service?
The date range affects the average time to restore service calculation. Use Flow’s date range filter to view the incidents within the selected date range. To determine which incidents to include in the calculation for a certain date range filter, the incident's closed date must be included in that date range.
Below is an example of when an incident is captured in Time to restore service.
Incident start date | Incident end date | Date range filter | Does it show as an incident | Which day |
---|---|---|---|---|
1/5/2022 | 1/8/2022 | 1/2 - 1/5 | No | |
1/6 - 1/7 | No | |||
1/2 - 1/8 | Yes | 1/8 | ||
1/7- 1-15 | Yes | 1/8 |
Note: Change failure rate uses the created date of an incident to select whether to show an incident. Because of this, the number of incidents for Change failure rate and Time to restore service for a given time period may not match.
Time to restore service benchmarks
Flow uses four benchmarks to categorize the status of Time to restore service. An elite benchmark indicates a healthy Time to restore service, while a low benchmark indicates potential platform instability. The benchmarks displayed for this DORA metric are partially based on the 2021 State of DevOps Report (PDF, opens in new tab), and customized for Flow. Flow compares the value of the metric against these benchmarks.
Not all processes and organizations allow for meeting the elite rating. Use the time as a signal to indicate where your team can focus their efforts in removing friction. The goal is continuous improvement rather than meeting a benchmark.