Skip to content

Use a principled, and consistent, implementation of freelists. #89

@markshannon

Description

@markshannon

We currently have many ad-hoc per class free-lists.

We should consider changing freelists from per-class to per-size.
My informal experiments show that currently about 50% of allocations are served from free lists, whereas with per-size freelists, this increased to about 90%.
With tuning, we should expect a hit rate well over 90%.

Using free lists for blocks of memory, instead of objects, means that there are no interactions between the free lists and the cycle GC, simplifying both. There is a small additional cost of re-assigning the class, but the cost of this will be negligible in most cases.

Design principles

  • Must support tracemalloc. That is, if tracemalloc is enabled, then no allocations should use free lists.
  • There should be one free list per size class.
  • The mapping from size to size class should be a small, pure function, ideally a fast one.
  • The mapping from size class to free list should be a small, pure function, ideally a fast one.
  • The functions for determining if a free list is full or empty should fast.
  • The procedure for pushing to and popping from a freelist should be a small and fast (this cannot be pure, since its purpose is a side effect).

With the above, allocation becomes:

void *allocate(size_t s)
{
    FreeList *list = size_class_to_free_list(size_to_size_class(s));
    if (!is_empty(list)) {
        /* fast path */
       return free_list_pop(list);
    }
    /* slow path */
    ...
}

The functions mentioned above must be small and pure so that:

  • If the input is a compile-time constant, then the whole expression is a compile-time constant.
  • If the input is not a compile-time constant, then the whole expression can be inlined if the compiler thinks it beneficial to do so.

In practice, we will probably inline most of the above function calls, it just helps to think of the parts separately when designing.

Implementation of the freelist.

One possible design of the free lists is:

Each free list consists of three elements.

  • A pointer to the head of the list (or NULL if empty)
  • An integer representing the maximum capacity. This should be determined empirically, but is likely to be larger for smaller sizes.
  • An integer representing the remaining space.

For alignment reasons the integers should be half the size of a pointer, which limits the capacity to 64k on a 32 bit machine, which is plenty.

The actual list is implemented as a linked list through the blocks of memory themselves, treating the first word of the memory as a pointer to the next entry.

  • tracemalloc support: By setting the head to NULL and the remaining space to zero, no allocations will be from the free list. Should tracemalloc be turned off, the remaining space can be set back to the capacity.
  • Functions for determining if the list is full or empty are fast: freelist->remaining == 0 and freelist->head == NULL, respectively.
  • The mapping from size class to free list can be made fast and simple, by placing the freelists in an array.
  • Pushing is simple enough, and with few dependent loads: new_entry->next = freelist->head; freelist->head = new_entry; freelist->remaining--;
  • Likewise popping: result = freelist->head; freelist->head = result->next; freelist->remaining++;

By putting the free lists in an array, the mapping from size class to free list is very simple.

Behavior when free list is full or empty; interaction with the malloc implementation.

This simplest option is to call malloc when the list is empty and to call free when the list is full.
Unfortunately this leads to fragmentation and performs poorly when repeatedly allocating or deallocating.

To avoid fragmentation, we should clear the free list when it is full and we need to free another object.
To avoid trashing on alternating malloc/frees we should half-fill the list when it is empty and we need to allocate an object.
Doing bulk free/mallocs reduces the number of free list misses considerably and keeps fragmentation down.

Ideally the malloc implementation would have functions for freeing and allocating multiple objects of the same size, which they generally don't. We could add that feature to PyMalloc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions