Below are some new (for me) things, that are worth noting:
- Perses is a dashboard for Prometheus (and later other sources) metrics
- OpenTelemetry is a standard for metrics, traces and logs
- Beyla automated application metrics with eBPF and Prometheus (more below)
- Kepler Kubernetes Efficient Power Level Exporter
- Platypus attack. Power-side channel attacks on Intel CPUs. Guessing CPU instructions and operands by their memory consumption, and ability to extract crypto keys
- Cloud Carbon Footprint tool to estimate energy use and carbon emissions from public cloud usage
- CNCF Mentoring Initiatives if anyone wants to do contribute and need help
- Autometrics add SLOs like success rate, latency, attach to your code and let it do the restore
- create a tool (prometheus) and everyone would try to find an application for it, even if it is not always the right tool. Common issue.- create a tool and everyone would build 100x more tools around it. No wonder it’s so hard to figure out what’s good
- measuring code metrics from inside the code is not right, as it doesn’t include kernel, IO, network timing and client side issues
Grafana’s beyla is a “eBPF-based auto-instrumentation of HTTP and HTTPS services”
This is a quite novel and interesting approach to collect metrics from code without even touching the code. It does so by using eBPF.
My understanding of eBPF is that it is a set of kernel calls that allow user-space applications to get notified on various events that are happening inside kernel. For example listen to all
It is also language (and version) specific though, intercepting those calls alone is not enough, you need to know memory layout to get list of arguments and context of that call.
Beyla mainly supports GO code and some generic HTTP, HTTPS calls for other languages and can provide basic RED metrics (Request, Errors, Duration)
It can send later all this events to prometheus directly.
User-space monitoring program requires
It is really hard to measure impact of data centers, and cloud services provide limited carbon monitoring.
Kepler Kubernetes-based Efficient Power Level Exporter. It uses RAPL (Intel-based Running Average Power Limit) feature that reports energy consumption. Which later uses eBPF to attribute power usage to process and then to pod. So you could later aggregate by namespace/deployment/service.
Downside is that RAPL not accessible in VMs and is the source of fantastic vulnerability discovered in Platypus attack.
Platypus attack is a brilliant and creative way to sample power usage and associate it with unique processor instruction (different instructions consume different amount of power). It does also allow to get the Hamming distance of its operands (number of non-zero bits). Having this info it is “really easy” to extract information like crypto keys from memory. Crazy.
What I find interesting is that it allows to provide objectives (SLOs) in form of success rate, latency. Besides that, it generates promql queries to query those metrics in provided Grafana dashboards.
“Auto” in name might be misleading, because it still requires adding a library, making changes to code and providing a config.