Observability Strategies

USE

USE stands for:

Utilization - Percent time the resource is busy, such as node CPU usage Saturation - Amount of work a resource has to do, often queue length or node load Errors - Count of error events This method is best for hardware resources in infrastructure, such as CPU, memory, and network devices. For more information, refer to The USE Method.

RED

RED stands for:

Rate - Requests per second Errors - Number of requests that are failing Duration - Amount of time these requests take, distribution of latency measurements This method is most applicable to services, especially a microservices environment. For each of your services, instrument the code to expose these metrics for each component. RED dashboards are good for alerting and SLAs. A well-designed RED dashboard is a proxy for user experience.

The Four Golden Signals

According to the Google SRE handbook, if you can only measure four metrics of your user-facing system, focus on these four.

This method is similar to the RED method, but it includes saturation.

  • Latency - Time taken to serve a request

  • Traffic - How much demand is placed on your system

  • Errors - Rate of requests that are failing

  • Saturation - How “full” your system is