Observability Strategies
USE
USE stands for:
Utilization - Percent time the resource is busy, such as node CPU usage Saturation - Amount of work a resource has to do, often queue length or node load Errors - Count of error events This method is best for hardware resources in infrastructure, such as CPU, memory, and network devices. For more information, refer to The USE Method.
RED
RED stands for:
Rate - Requests per second Errors - Number of requests that are failing Duration - Amount of time these requests take, distribution of latency measurements This method is most applicable to services, especially a microservices environment. For each of your services, instrument the code to expose these metrics for each component. RED dashboards are good for alerting and SLAs. A well-designed RED dashboard is a proxy for user experience.
For more information, refer to Tom Wilkie’s blog post The RED method: How to instrument your services.
The Four Golden Signals
According to the Google SRE handbook, if you can only measure four metrics of your user-facing system, focus on these four.
This method is similar to the RED method, but it includes saturation.
-
Latency - Time taken to serve a request
-
Traffic - How much demand is placed on your system
-
Errors - Rate of requests that are failing
-
Saturation - How “full” your system is