Trace your microservices with ZIO

A simple guide on plugging in tracing to your services

Tracing your distributed system is important for bringing in some clarity to an otherwise tumultuous environment. The goal of this blog is to promote the tracing of your services and to show how simple it is to get it working with Scala and ZIO. I wish tracing was more popular — I’ve had my fair share of difficulties debugging, and answering “what went wrong across this set of services for this customer and how can we mitigate it in the future?”

In this guide, I’ll provide a brief explanation of some tracing topics on the following example:

Services diagram

We’ll have two services. When Foo service is queried it will issue an HTTP request to Bar service, which will do a bunch of queries in ElasticSearch.

Example of a distributed trace

Contents

You can find the source code of the example using OpenTelemetry here. The same example based on OpenTracing is here.

Why tracing?

Tracing gives you insight into the flow of requests in your distributed system. Traditionally, engineers were searching through logs and building this context up themselves. It’s not unusual for some logging patterns to emerge — whether it’s fully structured logging (e.g. some JSON payload), or semi-structured such as that it contains various searchable patterns such [tenantId=$tenantId]. This may become problematic when a new error occurs on production and you need to quickly parse it and figure out precisely what happened on which service. Difficulty rises with the number of services, databases, and so. The goal of tracing is to help with these pain points.

If you feel like tracing would be useful, but your services are under heavy load and you think it’s unnecessary for each inbound request, you can reduce the number of requests you trace by sampling. For example, you might want to trace only with some probability (and likely on failure).

Even if you trace 10% of all traces, you still get a full distributed trace with every one of the spans —complete information. In our example it would mean looking at the trace of we would see all spans of a request across Foo -> Bar -> ElasticSearch. We would never see incomplete information, such as Foo -> [Missing span due to sampler] -> ElasticSerach.

This can be very useful in practice — you can observe performance characteristics at the fraction of the cost of tracing every request.

Setting up the project

We need to run ElasticSearch (Bar will make a request to it) and Jaeger (so you can see your traces). We can start them with docker:

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:8.15.0docker run --rm -it \
-p 16686:16686 \
-p 14250:14250 \
jaegertracing/all-in-one:1.16

Create a new project via template fromsbt new scala/scala3.g8. You need to have SBT installed.

Add dependencies to your build.sbt file.

val zioHttpV = "1.0.0.0-RC17"
val openTracingV = "0.33.0"
val opentelemetryV = "1.6.0"
val zioTelemetryV = "0.8.2"
val log4jV = "2.14.1"
val sttpV = "3.3.14"
val elasticV = "7.15.0"
val grpcNettyShadedV = "1.40.1
libraryDependencies ++= Seq(
"org.apache.logging.log4j" % "log4j-slf4j-impl" % log4jV,
"io.d11" %% "zhttp" % zioHttpV, // 1)
"dev.zio" %% "zio-opentelemetry" % zioTelemetryV, // 2)

"io.opentelemetry" % "opentelemetry-exporter-jaeger" % opentelemetryV,
"io.opentelemetry" % "opentelemetry-sdk" % opentelemetryV,
"io.grpc" % "grpc-netty-shaded" % grpcNettyShadedV,

"com.softwaremill.sttp.client3" %% "core" % sttpV, // 3)
"com.softwaremill.sttp.client3" %% "async-http-client-backend-zio" % sttpV,
"org.elasticsearch.client" % "elasticsearch-rest-high-level-client" % elasticV // 4)

(1) Both services Foo and Bar will be built with a simple ZIO-HTTP webserver.
(2) Telemetry implementation will be backed by Jaeger, which implements the OpenTracing API. To get OpenTracing API to our ZIO application, we’ll use the library from the ZIO ecosystem called ZIO-Telemetry. It exposes three modules based on three standards — OpenTracing, OpenCensus, and OpenTelemetry.
(3) To make an HTTP request from Foo to Bar service, we’ll use STTP.
(4) ElasticSearch Java client will be used to send a request from Bar to our ElasticSearch instance.

OpenTelemetry is the newest kid in the town. It is a merge of OpenTracing (which focuses solely on tracing) and OpenCensus (tracing and metrics). We won’t be needing metrics in this post, we’ll still use it as it is the latest promising standard.

If you are on the OpenTracing stack and are interested in following this blog, you can find the OpenTracing version of the code here.

Jaeger Tracer

We’ll start by creating a layer that exports our traces via GRPC to Jaeger. Open tracing.scala file.

import zio.*
import io.opentelemetry.exporter.jaeger.JaegerGrpcSpanExporter
import io.opentelemetry.api.trace.*
import io.opentelemetry.sdk.OpenTelemetrySdk
import io.opentelemetry.sdk.resources.Resource
import io.opentelemetry.sdk.trace.*
import io.opentelemetry.sdk.trace.`export`.SimpleSpanProcessor
import io.opentelemetry.api.common.Attributes
import io.opentelemetry.semconv.resource.attributes.ResourceAttributes

def serviceTracer(serviceName: String): TaskLayer[Has[Tracer]] =
val serviceNameResource =
Resource.create(// 1)
Attributes.of(ResourceAttributes.SERVICE_NAME, serviceName)
)
(for {
spanExporter <- Task(
JaegerGrpcSpanExporter
.builder()
.setEndpoint("http://127.0.0.1:14250") // 2)
.build()
)
spanProcessor <- UIO(SimpleSpanProcessor.create(spanExporter))
tracerProvider <- UIO(
SdkTracerProvider
.builder()
.addSpanProcessor(spanProcessor)
.setResource(serviceNameResource)
.build()
)
openTelemetry <- UIO(
OpenTelemetrySdk
.builder()
.setTracerProvider(tracerProvider)
.build()
)
tracer <- UIO(
openTelemetry.getTracer("zio.telemetry.opentelemetry")
)
} yield tracer).toLayer

(1) Setup the attribute of service name — this makes it clear in our tracing where is the actual span coming (e. g. Foo or Bar service).
(2) We point it to the instance that is provisioned by the docker we ran earlier. Many of these could come from config of some sort — but are omitted for brevity.

In production code I recommend actually managing resources via ZManaged, so everything gets properly acquired and released at end of the lifecycle. See the example.

The remainder of the code is just plumbing of Java libraries. I encourage you to check the examples on OpenTelemetry Java SDK. As the last step, we convert the tracer to a Layer so we can inject it into our services.

For more options on the telemetry configuration, I recommend checking their documentation.

Foo Service

Foo Service will receive requests from you and send requests to Bar. It will set up the root of the tracing graph, and inject the context into the dispatching HTTP request.

Start by creating FooServer.scala.

object FooServer:
// 1)
val propagator: TextMapPropagator =
W3CTraceContextPropagator.getInstance()
val setter: TextMapSetter[mutable.Map[String, String]] =
(carrier, key, value) => carrier.update(key, value)
val errorMapper: PartialFunction[Throwable, StatusCode] = {
case _ => StatusCode.UNSET
}
val api = Http.collectM[ZioRequest] {
// 2)
case req @ Method.GET -> Root / "foo" =>
val response = for
_ <- zio.console.putStrLn("foo received message ")
resp <- sendRequestToBar
yield ZioResponse.text("sent")

val span = s"${req.method.toString()} ${req.url.asString}"

Tracing
.root(span, SpanKind.SERVER, errorMapper)(response) // 3)
}

(1) Now we prepare utilities to propagate our tracing context across service boundaries — so that the parent of the span created on Bar service knows that it originates in Foo. We choose W3C propagator based on HTTP headers. Those headers will be easy to append into our HTTP Client, and we can then read them and re-construct the context on the Bar service.
(2) When a request hits /foo endpoint, we print a message to console, and send a request to a Bar service — a function we’ll implement soon.
(3) Finally we plug in the actual trace information — generated span name from the method name and URL, in this case, “GET /foo” error mapper and span kind.

Let’s implement the sendRequestToBar method.

// 1)
type RequestBackend = SttpBackend[
Task,
capabilities.zio.ZioStreams & capabilities.WebSockets
]
private def sendRequestToBar =
for
client <- ZIO.service[RequestBackend] // 2)
// 3)
carrier <- UIO(mutable.Map[String, String]().empty)
_ <- Tracing.inject(propagator, carrier, setter)
// 4)
request = basicRequest
.headers(carrier.toMap)
.post(uri"http://localhost:9000/bar")

_ <- client.send(request)
yield "ok"

(1) We start by declaring a type of our STTP backend — one that will send requests to Bar. We will provide this backend via ZLayer in our main application soon.
(2) Get the HTTP backend instance from the environment.
(3) Read the actual tracing context information, and inject it into our carrier. The mutable map will contain all the headers we want to provide in our HTTP request to Bar.
(4) Do a POST request to /bar (which will run on port 9000). Notice that we add the headers from the carrier.

Now it’s time to build our app’s main— wire it all together:

object FooServerApp extends App:
val tracer = serviceTracer("foo") // 1)

val httpClient = AsyncHttpClientZioBackend.managed().toLayer // 2)

override def run(args: List[String]): URIO[zio.ZEnv, ExitCode] =
val api = FooServer.api
val zioTracer =
(tracer ++ ZLayer.requires[Clock])
>>> zio.telemetry.opentelemetry.Tracing.live // 3)
// 4)
Server
.start(8000, api)
.provideCustomLayer(zioTracer ++ httpClient)
.exitCode

(1) We instantiate the tracer via the function that we defined in the beginning, with the name of the service.
(2) We create a ZIO-friendly HTTP client using the STTP library
(3) Our ZIO OpenTelemetry tracer depends on the telemetry tracer that we had created in 1) and a Clock instance. We feed the dependencies using symbolic operator >>>.
(4) Start the server on port 8000 and provide both dependencies — our tracer and our HTTP client.

We have a Foo sever finished. When contacted on the right endpoint, it will make a request to -yet not existing- Bar service. It will trace it and export it via OpenTracing to Jaeger.

Before we implement Bar service, let’s implement a tiny ElasticSearch client we’ll use from the Bar service.

ElasticSearch Client

Let’s start by defining a trait in EsThinClient.scala. Our client will support only a single operation — to query a cluster for its health. Since we are using Java ElasticSearch client which makes HTTP requests — I/O operations, we will want to shift the computation to Blocking dispatcher (without looking at its source code, it’s only a guess that it actually blocks). We also want to trace the request, so we will need both Blocking and Tracing present in our ZIO environment.

This is purely for demonstration purposes — should we want to trace every operation of the ElasticSearch Client, there are much better ways than going through API one-by-one and manually specifying tracing — e.g. hooking up the implementation of org.elasticsearch.transport.client.PreBuiltTransportClient, where we’d track each operation and description.

trait EsThinClient:
def clusterHealth
: ZIO[Blocking & Tracing, Throwable, ClusterHealthResponse]

Let’s set up an instance

object EsThinClient:
type EsThinClientService = Has[EsThinClient]

def liveLocalhost: ZLayer[Any, Throwable, EsThinClientService] =
// 1)
val makeClient = UIO(
new RestHighLevelClient(
RestClient.builder(new HttpHost("localhost", 9200, "http"))
)
)
ZManaged
.fromAutoCloseable(makeClient) // 2)
.map { restClient =>
new EsThinClient:
override def clusterHealth: ZIO[
Blocking & Tracing,
Throwable,
ClusterHealthResponse
] =
effectBlockingIO( // 3)
restClient // 4)
.cluster()
.health(
new ClusterHealthRequest(),
RequestOptions.DEFAULT
)
)
.span("health request") // 5)
}
.toLayer

(1) We define a computation that creates an instance of Java ElasticSearch client that connects to our localhost.
(2) The client implements a Closable interface, and ZIO has a convenient method that we use to ensures it gets closed properly on layer shutdown.
(3) As mentioned in beginning, we shift to the blocking dispatcher, to ensure we don’t block our main ZIO scheduler with some IO operation (e.g. if ElasticSearch is overloaded and takes a long time to answer).
(4) Using the official Java client we create and send the request.
(5) We mark that this computation to fetch cluster health is a span called “health request”. That’s the span we should see in our Jaeger dashboard.

This client can now be used in Bar Service to probe for ElasticSearch cluster health.

Bar Service

The BarService.scala will be similar to the previous one, except instead of sending an HTTP request to another service it will send a request to ElasticSearch (which is well, an HTTP request :)). For that, we’ll use the small dependency that we had declared before.

object BarServer:
// 1)
val propagator: TextMapPropagator =
W3CTraceContextPropagator.getInstance()
val getter: TextMapGetter[List[Header]] =
new TextMapGetter[List[Header]]:
def keys(carrier: List[Header]): lang.Iterable[String] =
carrier.map(_.name.toString).asJava

def get(carrier: List[Header], key: String): String =
carrier
.find(_.name.toString == key)
.map(_.value.toString)
.orNull

val api = Http.collectM[ZioRequest] {
// 2)
case req @ Method.POST -> Root / "bar" =>
val headers = req.headers
.map(x => x.name.toString -> x.value.toString)
.toMap
.asJava
// 3)
val response = for
esClient <- ZIO.service[EsThinClient]
_ <- zio.console.putStrLn("bar received message ")
// 4)

_ <- ZIO
.foreachPar_(1 to 3)(_ => esClient.clusterHealth)
.span("make es requests")
yield ZioResponse.text("bar response")

val span = s"${req.method.toString()} ${req.url.asString}"

response // 5)
.spanFrom(
propagator,
req.headers,
getter,
span,
SpanKind.SERVER
)
}

(1) Similiar to Foo Service — we need a propagator, so we know how to read the tracing context from the headers.
(2) Service listens on /bar endpoint and accesses request’s headers.
(3) We build the response by accessing the ElasticSearch client we from the environment.
(4) Start three requests in parallel to the ElasticSearch using our client under the “make es requests” span.
(5) Create a span from context, based on headers.

That’s it for Bar! We just need to wire it in a runnable App, in a very similar fashion as we did with Foo.

object BarServerApp extends App:
val tracer = serviceTracer("bar")
val esClient = EsThinClient.liveLocalhost

override def run(args: List[String]): URIO[zio.ZEnv, ExitCode] =
val zioTracer = (tracer ++ ZLayer
.requires[Clock]) >>> zio.telemetry.opentelemetry.Tracing.live

Server
.start(9000, BarServer.api)
.provideCustomLayer(zioTracer ++ esClient)
.exitCode

We create tracer and ElasticSearch client, inject them, and start Bar on port 9000.

Running the services

Start the services with and make a few requests:

sbt "runMain server.FooServer"
sbt "runMain server.BarServer"
curl localhost:8000/foo

Open your http://localhost:16686.

You can see the GET request we issued to foo-service/foo, a POST request from foo-service->bar-service, and a span containing all three cluster health requests running in parallel.

Closing remarks

We were able to build two servers with basic tracing in few tens of lines of code. We can improve on it (e.g. track every ElasticSearch request, every STTP request to a service, trace every endpoint that was hit automatically, tag everything with additional information), but that is left up to the reader :-)

ZIO-Telemetry is an incredibly useful library if you need to trace and gather metrics in your services. It is released for both Scala2 and Scala3. As always, be sure to check their documentation and examples.