Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions blog/2024-10-07-plotting-memusagestat-data-points.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
slug: plotting-memusagestat-data-points
title: Plotting Memusagestat Data Points in Python
authors: nils
tags: [linux, glibc, python]
---

`memusage` is a great tool to track all allocations for a program,
this comes to many Linux distributions with `glibc` itself.
So if you need a quick-and-easy allocation tracker `memusage` is the tool for you!
There is a plotter, `memusagestat`,
that can draw a simple image for you.
However, it shows its age and is inconvenient to work with.
So we have create a new plotter in Python
to help you track and visualize allocations.

![Improvement](/img/memusagestat-bazel-improvement.png)

This came up during our preparations for the Bazelcon talk [Dude, where is my RAM?].
But the investigation and the graphs produced by `memusagestat` did not make the cut for the talk.
Nevertheless, it was interesting to tinker with these tools
and we hope it can be useful for you, to use the new plotter,
or better understand how to work with core Linux programs like this.

[Dude, where is my RAM?]: /blog/memory-adventure/

You can read the full document here: [Plotting memusagestat's data].
There are two take-home messages:

1. A case-study in building and modifying `glibc` tools for your computer:
here: [building glibc](/docs/practice/plotting-memusagestat-with-python#building-memusagestat-and-glibc).
This includes debugging mysterious crashes before the start instruction
in an ELF program.

2. Our code to plot allocations for your programs.
Read the [Python script](/docs/practice/plotting-memusagestat-with-python#Parse-the-binary-file-in-Python)
section, and checkout the code [from github].

[Plotting memusagestat's data]: /docs/practice/plotting-memusagestat-with-python/
[from github]: https://github.com/meroton/plot-memusage
321 changes: 321 additions & 0 deletions docs/practice/plotting-memusagestat-with-python.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,321 @@
# Plot memusagestat's data in Python

`memusage` is a great tool to track all allocations for a program,
this comes with `glibc` itself
in several Linux distributions.
If you need a quick-and-easy allocation tracker,
`memusage` is the tool for you!
There is a plotter, `memusagestat`,
that can draw a simple image for you.
However, it shows its age and is inconvenient to work with.
We have created a new plotter in Python
to help you track and visualize allocations.

![Improvement](/img/memusagestat-bazel-improvement.png)

This came up during our preparations for the Bazelcon talk [Dude, where is my RAM?].
But the investigation and the graphs produced by `memusagestat` did not make the cut for the talk.
Nevertheless, it was interesting to tinker with these tools
and we hope it can be useful for you, to use the new plotter,
or better understand how to work with core Linux programs like this.

[Dude, where is my RAM?]: /blog/memory-adventure/

## memusage & memusagestat

`memusage` is a good profiling tool bundled with `glibc` on Linux,
it tracks allocations for a program.
Its sibling `memusagestat` can plot the allocations,
which is interesting when looking at out-of-memory problems
and garbage-collection/memory thrashing.
This is good for an initial investigation,
however there are few options for how to present the data
and the graph itself does not look very nice.
<!-- You can select to plot either by sequential allocation or by linear time. -->
We want to generate new,
nicer looking, figures
for the same data.

In this example we use `memusage` to profile a `bazel` invocation that runs out of memory

```
$ memusage --data path/to/bazel.mem \
bazel build ... \
//...
```

and render a plot

```
$ memusagestat bazel.mem -T -t
```

![original memusagestat plot](/img/memusagestat-bazel.png)

We should be able to make a nicer plot with a better plotting library.
Note, the x-axis is the number of allocations.
When plotted against time the graph instead looks like a noisy square wave,
which is interesting in its own right.
As we realized in the talk,
Bazel performs a lot of allocations to analyze and build the tree,
then it struggles close to the limit of memory for a long while before crashing.

## Building memusagestat and glibc

### memusage writes a binary datafile

Glancing at [the code] it is quite straight forward
and we should be able to parse it just fine.

[the code]: https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/memusagestat.c;h=837b613c2bb759fd98572fd7984c0682bf9d43be;hb=HEAD

```
struct entry
{
uint64_t heap;
uint64_t stack;
uint32_t time_low;
uint32_t time_high;
};

...

static void
update_data (struct header *result, size_t len, size_t old_len)
/* Write out buffer if it is full. */
if (idx + 1 == buffer_size)
write (fd, buffer, buffer_size * sizeof (struct entry));
else if (idx + 1 == 2 * buffer_size)
write (fd, &buffer[buffer_size], buffer_size * sizeof (struct entry));
}
```

### Compile memusagestat

For good measure we can patch the parsing code in `memusagestat`
to create an _oracle implementation_,
which prints reference data for our parser.
This affords us to quickly verify that our parser interprets the data
just like the `memusagestat`.

We need the `glibc` source code and can be build `memusagestat` with its `make` build system.
The first step is to use the same version as the host OS.

```
$ ldd --version | head -1
ldd (Ubuntu GLIBC 2.35-0ubuntu3.3) 2.35
```

```
$ git clone https://sourceware.org/git/glibc.git
$ cd glibc
$ git checkout release/2.35/master
```

Then create a build directory and configure the build.
We use an explicit `prefix`, to safe guard against a `make install`
which could replace the host files.
A warning is raised if prefix is not configured to warn that it could replace core libraries,
and potentially corrupt the operating system.

```
glibc $ mkdir -p build/{prefix,x86_64}
glibc $ cd build/x86_64
x86_64 $ ../../configure --prefix $PWD/../prefix --disable-werror
glibc $ cd -
glibc $ export objdir=$PWD/build/x86_64
```

`configure` will warn against missing dependencies
and we also need `libgd` to build `memusagestat` for its graphical components,
which is an optional dependency, so no warning is raised.
We must also use `patchelf` later.

```
$ sudo apt install gawk libgd-dev patchelf
```

This can be adjusted to fit any package manager,
and any additional dependencies `configure` fails to find.
With a successful `configure` we can compile it all.
Note, just compiling the `malloc` directory fails for some other target that we do not need.
So we compile everything, and ignore some unrelated errors.

```
$ make
```

### Run memusagestat

```
$ cd build/x86_64
$ malloc/memusagestat path/to/bazel.mem
fish: Job 1, 'malloc/memusagestat ~/gits/llvm…' terminated by signal SIGSEGV (Address boundary error)
```

This is an interesting crash,
and `gdb` does not help us either.

```
$ malloc/memusagestat path/to/bazel.mem
(gdb) start
Temporary breakpoint 2 at 0x2440: file memusagestat.c, line 120.
Starting program: /home/nils/bin/gits/glibc/build/x86_64/malloc/memusagestat --help
During startup program terminated with signal SIGSEGV, Segmentation fault.
```

The program crashes before calling `main`
and even before we reach `_start`, [the code before main].

```
(gdb) starti
Starting program: /home/nils/bin/gits/glibc/build/x86_64/malloc/memusagestat --help
During startup program terminated with signal SIGSEGV, Segmentation fault.
```

So the problem lies in the `execve` path:

```
$ strace malloc/memusagestat --help
execve("malloc/memusagestat", ["malloc/memusagestat", "--help"], 0x7fffc627fc38 /* 43 vars */) = -1 ENOENT (No such file or directory)
strace: exec: No such file or directory
```

Some file is missing.
This is a little confusing but technically correct,
Similar errors are often printed by the shell
if the program itself is missing ("malloc/memusagestat: command not found"),
or if the program is asked to read a file ("./x: No such file or directory").

```
$ file ./malloc/memusagestat
./malloc/memusagestat: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked,
interpreter /.../build/x86_64/../prefix/lib/ld-linux-x86-64.so.2,
BuildID[sha1]=74f9d6bd198b4d8fa5d92ba0629dca2f4994f2cf,
for GNU/Linux 3.2.0, with debug_info, not stripped
```

This is more uncommon,
our program is an interpreted file that requires the `ld` interpreter to start.
This is the same execution pathway
as shell, python or other interpreted scripts that start with a shebang ("#!/usr/bin/env python3").
The program wants `ld` from our `glibc` build directory,
because compiled with a safety prefix it does not use the system `ld`.
We can restore that:

```
$ patchelf --set-interpreter /usr/lib64/ld-linux-x86-64.so.2 malloc/memusagestat
```

This does it! We can now run and instrument `memusagestat`.


[the code before main]: https://embeddedartistry.com/blog/2019/04/08/a-general-overview-of-what-happens-before-main/

### Patch memusagestat

```
diff --git a/malloc/memusagestat.c b/malloc/memusagestat.c
index a1ee5cfacd..fa318622ea 100644
--- a/malloc/memusagestat.c
+++ b/malloc/memusagestat.c
@@ -32,6 +32,7 @@
#include <stdint.h>
#include <sys/param.h>
#include <sys/stat.h>
+#include <inttypes.h>

#include <gd.h>
#include <gdfontl.h>
@@ -223,6 +224,15 @@ main (int argc, char *argv[])
write (fd, headent, 2 * sizeof (struct entry));
}

+ printf(" : %zu\n", headent[0].heap);
+ printf("maxsize_total : %zu\n", headent[0].stack);
+ printf("maxsize time_low : %d\n", headent[0].time_low);
+ printf("maxsize time_high: %d\n", headent[0].time_high);
+ printf("maxsize_heap : %zu\n", headent[1].heap);
+ printf("maxsize_stack : %zu\n", headent[1].stack);
+ printf(" time_low : %d\n", headent[1].time_low);
+ printf(" time_high: %d\n", headent[1].time_high);
+
if (also_total)
{
/* We use one scale and since we also draw the total amount of
@@ -376,6 +386,16 @@ main (int argc, char *argv[])

now = ((uint64_t) entry.time_high) << 32 | entry.time_low;

+ printf(
+ "%" PRIu64 ": (%" PRIu64 ", %" PRIu64 ", %" PRIu32 ", %" PRIu32 " (%" PRIu64 "))\n",
+ cnt - 1,
+ entry.heap,
+ entry.stack,
+ entry.time_low,
+ entry.time_high,
+ now
+ );
+
if ((((previously - start_time) * 100) / total_time) % 10 < 5)
gdImageFilledRectangle (im_out,
40 + ((cnt - 1) * (xsize - 80)) / total,
```

This shows each entry in the binary file,
heap, stack, lower 32 bits of the time and the higher,
as well as the joined time within parentheses.
The heap and stack are sent as unsigned 64 bit values, but the time is split in two.

```
build/x86_64 $ malloc/memusagestat ~/gits/llvm-project/utils/bazel/bazel.mem | head
: 0
maxsize_total : 820500
maxsize time_low : 195332156
maxsize time_high: 2385035
maxsize_heap : 813433
maxsize_stack : 12752
time_low : 823673266
time_high: 2385328
0: (72704, 224, 195398422, 2385035 (10243647520213782))
1: (72768, 0, 195490698, 2385035 (10243647520306058))
...
```

## Parse the binary file in Python

A problem well-stated is half solved.
We now have everything we need:
a good understanding of the problem,
and an oracle implementation to verify our parser directly.
We can implement a parser with the [struct] package:

```
FORMAT = "QQII" # u64, u64, u32, u32
SIZE = struct.calcsize(FORMAT)

data = open(sys.argv[1], 'rb').read()
slice = data[ n*SIZE : n*SIZE + SIZE ]
parts = struct.unpack(FORMAT, slice)
heap, stack, low, high = parts
time = entry.time_high << 32 | entry.time_low
```

Our `memusagestat` parser and plotter is [available on github]
and can draw prettier images:

![memusage stat in python](/img/memusagestat-python-reproduction.png)

We have currently implemented the most important subset of the API,
to be a drop-in replacement,
but there are some missing pieces.

Here is a slight modification
where the stack usage is shown to the same scale as the heap.
To show its insignificance for Bazel.

![single scale stack usage](/img/memusagestat-python-single-scale.png)

[available on github]: https://github.com/meroton/plot-memusage

[struct]: https://docs.python.org/3.8/library/struct.html
Binary file added static/img/memusagestat-bazel-improvement.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/memusagestat-bazel.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/memusagestat-python-reproduction.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/memusagestat-python-single-scale.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.