From Zero to Useless to Hero: Make Runtime Data Useful in Teams

By Florian Lautenschlager & Robert Hoffmann

Session summary 

They are building a GDPR compliant voice assistant that give users control over personal data.

Monitoring stack slide

Lovely but not usable by their team.

How did they improve the usage and culture in their organisation to their monitoring toolkit.

E.g. Adding links from metrics re tests failing to system logs related to that test.

Create a dashboard to facilitate frequent queries, e.g. Search by trace id or username.

Also they integrated the links into commonly used tools, like ticket systems or chat tools.

Conclusion slide

 

How to measure Linux Performance Wrong

By Peter Zaitsev from Percona

Session summary

Intro slide

Concurrency & latency

CPU

Look up USE method, e.g. Brendan Gregg

LoadAvg is interesting but what are the resources of the machine? Also this blurs IO and CPU usage. 

Look at Saturation metrics, normalised load and ??

PSI – new feature Pressure Stall Information

Look runqlat a command line tool to look at run queue latency

CPU states to not:

IO wait is idle

Steal is CPU not available to your VM

Disk space used vs file length, think sparse files. These are missed in du – sh commands.

free look at available memory not free memory running out.

ping or mtr?

Area of research bcc tools

 

 

 

 

 

 

Are You Testing Your Observability? Patterns for Instrumenting Your Services

Speakers:

  • Bartek PlotkaK
  • Kemal Akkoyun

Session summary 

Observability, example load balancer app:

Examples how many requests result in an error?

What about whether loaad balancer is fairly distributing the requests?

Does the load balancer introducing latency?

Prometheus metrics being promoting, quick, easy, scalable?

This was then visualised using grafana.

Do not confuse metric gathering and logging.

Unit test your metric instruments