Lots of things to consider:
- Automation of monitoring to avoid bottlenecks with deploying microservices.
- The underlying data
More than health checks which are more common with monoliths.
Define what does it mean for my application to be healthy. Discussing how you take this time series data and apply it to the additional dimensions that microservices bring.
- Aggregation horizontal metrics
- Vertical checks – what is going on in the stack Java, JVM, Docker, K8 node, hardware itself.
- Ensure new and legacy services also provide tracing
- Tie together traces to error logs.
Fabian helpfully points out these three Site Reliability Engineering books from Google that are available for free.