ℹ️ Select 'Choose Exercise', or randomize 'Next Random Exercise' in selected language.

Choose Exercise:
Timer 00:00
WPM --
Score --
Acc --
Correct chars --

Calculate Burn Rate Alert

PromQL

Goal -- WPM

Ready
Exercise Algorithm Area
1package main
2
3import (
4 "fmt"
5 "time"
6)
7
8// calculateBurnRate calculates the burn rate of an SLO budget over a specified period.
9// Burn rate is typically defined as the ratio of actual consumption to the allowed consumption rate.
10// For an SLO, this often means how quickly 'bad' events are occurring relative to the budget.
11func calculateBurnRate(metricName string, budgetDuration time.Duration, alertWindow time.Duration) string {
12 if metricName == "" {
13 fmt.Println("Error: Metric name cannot be empty.")
14 return ""
15 }
16 if budgetDuration <= 0 {
17 fmt.Println("Error: Budget duration must be positive.")
18 return ""
19 }
20 if alertWindow <= 0 {
21 fmt.Println("Error: Alert window must be positive.")
22 return ""
23 }
24
25 // Calculate the rate of 'bad' events (e.g., errors, latency violations).
26 // We use `rate()` for counter metrics.
27 // If the metric is already a rate, this step might be different.
28 actualRateQuery := fmt.Sprintf("rate(%s[%s])", metricName, alertWindow)
29
30 // Calculate the allowed rate based on the budget duration.
31 // For a typical SLO, the budget is a total number of allowed 'bad' events over a period.
32 // The allowed rate is total budget / budget duration.
33 // Assuming a budget of 1 (representing 100% availability or no 'bad' events allowed for simplicity here).
34 // A more complex SLO might have a specific number of allowed errors.
35 // For this example, we assume the metric represents 'bad' events and the goal is to keep its rate low.
36 // The 'budget' is implicitly defined by the desired target rate.
37 // A common approach is to compare the actual rate to a target rate derived from the SLO.
38 // Let's assume a target rate that corresponds to a 99.9% SLO over the budgetDuration.
39 // If budgetDuration is 30 days, and we allow 0.1% errors, the target rate is (0.001 * total_requests) / (30 days).
40 // For simplicity, let's assume the metric itself is a rate and we want to compare it to a threshold.
41 // A common burn rate calculation: (rate of bad events) / (target rate of bad events).
42 // Let's define a target rate that would consume the budget within budgetDuration.
43 // If the metric is `http_errors_total`, and we want 99.9% availability over 30 days,
44 // and we know the total requests per second, say `http_requests_total`.
45 // Target error rate = (0.001 * avg_rate(http_requests_total[30d]))
46 // Burn Rate = rate(http_errors_total[alertWindow]) / Target error rate
47
48 // For a simpler scenario: assume metricName is already a rate, and we have a target rate.
49 // Let's use a placeholder for target rate and focus on the ratio.
50 targetRatePlaceholder := "0.001" // Example: 0.1% error rate threshold
51
52 // The burn rate is the ratio of the actual rate to the target rate.
53 // A burn rate > 1 means we are exceeding the allowed rate.
54 burnRateQuery := fmt.Sprintf("%s / %s", actualRateQuery, targetRatePlaceholder)
55
56 // To make it more concrete for SLOs, let's consider a budget of 'X' units over 'budgetDuration'.
57 // The allowed rate is X / budgetDuration.
58 // If metricName is `slo_violations_total`:
59 // allowedRate = (Total Allowed Violations) / budgetDuration
60 // actualRate = rate(slo_violations_total[alertWindow])
61 // Burn Rate = actualRate / allowedRate
62
63 // Let's assume `metricName` is a counter of violations.
64 // And we have a total budget of `totalBudgetUnits` violations over `budgetDuration`.
65 // For simplicity, let's hardcode a budget and assume we want to alert if we consume it too fast.
66 // A common burn rate alert is: rate(metric[short_window]) / (total_budget / long_window)
67
68 // Let's refine: calculate the rate of violations over the alertWindow.
69 // Then, calculate the average rate of violations over the budgetDuration.
70 // Burn Rate = rate(metric[alertWindow]) / avg_rate(metric[budgetDuration])
71
72 // This is a common way to express burn rate for alerting.
73 // If the current rate is much higher than the average rate over a longer period,
74 // it indicates we are burning through the budget faster than expected.
75 burnRateFinalQuery := fmt.Sprintf("rate(%s[%s]) / avg_over_time(rate(%s[%s]))",
76 metricName,
77 alertWindow,
78 metricName,
79 budgetDuration)
80
81 return burnRateFinalQuery
82}
83
84// calculateBurnRateWithFixedBudget calculates burn rate against a fixed allowed rate.
85func calculateBurnRateWithFixedBudget(metricName string, allowedRate float64, alertWindow time.Duration) string {
86 if metricName == "" {
87 fmt.Println("Error: Metric name cannot be empty.")
88 return ""
89 }
90 if allowedRate <= 0 {
91 fmt.Println("Error: Allowed rate must be positive.")
92 return ""
93 }
94 if alertWindow <= 0 {
95 fmt.Println("Error: Alert window must be positive.")
96 return ""
97 }
98
99 // Calculate the actual rate of the metric.
100 actualRateQuery := fmt.Sprintf("rate(%s[%s])", metricName, alertWindow)
101
102 // Calculate the burn rate as the ratio of actual rate to the allowed rate.
103 burnRateQuery := fmt.Sprintf("%s / %f", actualRateQuery, allowedRate)
104
105 return burnRateQuery
106}
107
108func main() {
109 metric := "http_errors_total"
110 budgetPeriod := 30 * 24 * time.Hour // 30 days
111 alertPeriod := 5 * time.Minute
112
113 burnRateAlertQuery := calculateBurnRate(metric, budgetPeriod, alertPeriod)
114 fmt.Printf("Burn rate alert query: %s > 1\n", burnRateAlertQuery)
115
116 // Example with a fixed allowed rate
117 fixedAllowedRate := 0.001 // e.g., 0.1% error rate
118 fixedBurnRateQuery := calculateBurnRateWithFixedBudget(metric, fixedAllowedRate, alertPeriod)
119 fmt.Printf("Fixed budget burn rate query: %s > 1\n", fixedBurnRateQuery)
120}
Algorithm description viewbox

Calculate Burn Rate Alert

Algorithm description:

This scenario focuses on calculating the 'burn rate' of an Service Level Objective (SLO) budget. The burn rate measures how quickly the 'bad' events (violations) are occurring relative to the allowed rate defined by the SLO. A burn rate greater than 1 indicates that the system is consuming its budget of acceptable errors or downtime faster than planned, triggering an alert.

Algorithm explanation:

Burn rate is a key metric for SLO management. It's typically calculated as the ratio of the current rate of violations to the average rate of violations allowed by the SLO. For instance, if an SLO allows 100 errors over 30 days, the allowed rate is `100 / (30 days)`. If the current rate of errors over a short window (e.g., 5 minutes) is higher than this allowed rate, the burn rate will be greater than 1. PromQL implementations often involve `rate()` for the current violation rate and `avg_over_time(rate(...))` for the longer-term average or allowed rate. Time complexity depends on the functions used; `rate()` and `avg_over_time()` are typically O(N*W) where N is series and W is window. Space complexity is similar. Edge cases include zero or negative durations, and metrics that are not counters.

Pseudocode:

function calculateBurnRate(metricName, budgetDuration, alertWindow):
  if metricName or durations are invalid:
    return error message
  
  currentRate = "rate(" + metricName + "[" + alertWindow + "])"
  averageRate = "avg_over_time(rate(" + metricName + "[" + budgetDuration + "]))") // Or a derived allowed rate
  
  burnRate = currentRate + " / " + averageRate
  return burnRate

function calculateBurnRateWithFixedBudget(metricName, allowedRate, alertWindow):
  // Similar checks
  currentRate = "rate(" + metricName + "[" + alertWindow + "])"
  burnRate = currentRate + " / " + allowedRate
  return burnRate