Distributed tracing and Observability
TODO
How is distributed logging and tracing supposed to work for asynchronous systems? e.g. For applications that are using messaging or streaming products and using streaming strategies like pub-sub?
https://stackify.com/what-is-observability-everything-a-beginner-needs-to-know/
- How to set-up notifications, observability alerts?
Opentelemetry
Traces: https://opentelemetry.io/docs/concepts/signals/traces/
As requests flow through distributed systems, it’s important to keep track of how it travels, as this can be useful for monitoring and troubleshooting.
Tracing allows you to track the journey of a request as it moves through different services in a distributed environment. It provides a way to understand the flow of operations across these services, making it easier to pinpoint performance issues or errors.
Using tracing, you can break down the operations into smaller parts or pieces by identifying what happened, where, when, and how it happened, along with every other relevant information. This structured approach significantly enhances the effectiveness and efficiency of the debugging process.
Tracing is a fundamental aspect of observability. A trace is a collection of spans, providing a high-level view of how a specific request or transaction moves through various services within a distributed environment. Imagine a trace as a comprehensive map that outlines the path a request takes through the system.
Spans: https://signoz.io/blog/opentelemetry-spans/
Useful for understanding performance issues in a single service. e.g. Which functions are taking too long to complete?
An OpenTelemetry span represents a single unit of work within a system. It encapsulates information about a specific operation, including its start time, duration, associated attributes, and any events or errors during its execution.
Spring Cloud Sleuth
TODO
https://www.baeldung.com/spring-cloud-sleuth-single-application
Zipkin
TODO