FOSDEM – Page 2 – Tessa Alexander's notes

From Zero to Useless to Hero: Make Runtime Data Useful in Teams

Posted on February 2, 2020May 28, 2020 by Tessa (the external)

By Florian Lautenschlager & Robert Hoffmann

Session summary

They are building a GDPR compliant voice assistant that give users control over personal data.

Lovely but not usable by their team.

How did they improve the usage and culture in their organisation to their monitoring toolkit.

E.g. Adding links from metrics re tests failing to system logs related to that test.

Create a dashboard to facilitate frequent queries, e.g. Search by trace id or username.

Also they integrated the links into commonly used tools, like ticket systems or chat tools.

How to measure Linux Performance Wrong

Posted on February 2, 2020May 28, 2020 by Tessa (the external)

By Peter Zaitsev from Percona

Session summary

Concurrency & latency

CPU

Look up USE method, e.g. Brendan Gregg

LoadAvg is interesting but what are the resources of the machine? Also this blurs IO and CPU usage.

Look at Saturation metrics, normalised load and ??

PSI – new feature Pressure Stall Information

Look runqlat a command line tool to look at run queue latency

CPU states to not:

IO wait is idle

Steal is CPU not available to your VM

Disk space used vs file length, think sparse files. These are missed in du – sh commands.

free look at available memory not free memory running out.

ping or mtr?

Area of research bcc tools

Are You Testing Your Observability? Patterns for Instrumenting Your Services

Posted on February 2, 2020May 28, 2020 by Tessa (the external)

Speakers:

Bartek PlotkaK
Kemal Akkoyun

Session summary

Observability, example load balancer app:

Examples how many requests result in an error?

What about whether loaad balancer is fairly distributing the requests?

Does the load balancer introducing latency?

Prometheus metrics being promoting, quick, easy, scalable?

This was then visualised using grafana.

Do not confuse metric gathering and logging.

Unit test your metric instruments

Sharing Reproducible Results in a Container A container you can build anywhere

Posted on February 2, 2020May 28, 2020 by Tessa (the external)

By Efraim Flashner

Session summary

Very funny slides.

Using guix to help package and deploy a crazy 2010 PHP web app.

Area of research Octave

Facilitating HPC job debugging through job scripts archival

Posted on February 2, 2020May 28, 2020 by Tessa (the external)

By Andy Georges, sysadmin from Ghent University

Session summary

Goal of this project log and capture the job script and any relevant files to assist debugging crashed HPC jobs.