Tracing Summit 2023

I spent a bunch of time this year working on performance. As we enter December and we start winding down a bit, I thought it would be a good opportunity to catch up with Tracing Summit 2023 from the DiaMon workgroup, and perhaps add a few notes on the various videos.

Introduction.

Bit of meta about the summit and logistics, 'uncoference', lunch. A very "you had to be there" type of thing.

ThreadMonitor: Low-overhead Data Race Detection using Intel PT.

Introduces a post-mortem data race detector for C/C++ programs that use pthreads.
Unlike tsan, this is low overhead, and relies on on the Intel Processor Trace (PT) for tracing and then replays tsan-style verifications after the fact.
I appreciate the callout that a data race is not a concurrency error if the non-determinism is a design choice.
PTWRITE packets and thread and process ids are used to reconstruct execution to feed whatever tsan would normally use.
Memory accesses are instrumented at compile time via an LLVM pass - details on what and when gets instrumented (mostly memory access and function entry/exit).
The postmortem analysis allows better results - more shadow cells allocated reduce data race loss.
Benchmark shows about ~15% CPU overhead (vs ~10x overhead for tsan).

libside: Giving he preprocessor a break with a tracer-agnostic instrumentation API

Overview of existing static instrumentation and wishlist for new tool.
Library supports metadata that can be registered, structured data (fixed and variable sized), expects to define an ABI for other languages.
Integration with LLTng-UST.

Collecting telemetry data from low latency microservices

Presenting strategy to collect telemetry data from low latency microservices, contributions and benchmarks
LTTng and OpenTelemetry as related work
OTel provides collector and backends, protobuffer protos for spans
Combining both in various ways, with proprietary or opentelemetry instrumentation
Online collection and offline analysis

LTTng: The challenges of user-space tracing

Challenges: stable, predictable timing, reliable, adaptability to environment and minimize manual configuration
Shared resource tracer / runtime: impact on memory layout, custom memory allocator preloaded; single-threaded applications closing all file descriptors without warning (LD_PRELOAD for allocator and close/fclose/closefrom to avoid); signal handling (avoids usage); locks (loader/lttng deadlock); resources leaked in child process; single-thread assumptions
Shared resource tracer / external: async process termination (per-uid IPC over shared memory; TLS is too verbose; recommend per-pid); per-CPU ringbuffers overallocate
Other challenges: runtimes other than C/C++

Tracing Heterogeneous Programming Models with Lttng & Babeltrace

From the description: THAPI (Tracing Heterogeneous APIs) is an open-source tracing infrastructure for HPC platforms that use accelerators, developed at Argonne National Laboratory. It intercepts the low-level API calls (L0, CUDA driven, Cuda Runtime, HIP, OpenCL, OpenMP) in order to dump their arguments and timestamps in CTF format using LTTng. The traces can be analyzed postmortem leveraging babeltrace2 and plugin infrastructure. We've developed plugins to fulfill our multiple use cases (tally, timeline, pretty printer, validation).

From tracing to kernel programming

BPF started as tracing, first at syscalls and then other internals
Once BPF programs can write (eg modifying state for network), they become more general-purpose, safe kernel modules
Continuing to evolve in types and operations; innovation enabler

Trying to use uprobes and BPF on non-C userspace

Happy tracing!

Tags: debugging perf review

Home