Skip to content

Rewrite Allocator #2516

@dperny

Description

@dperny

The Allocator code is bad. I'd hazard to say it's irredeemably bad. It's a source of constant bugs and breakages. Things are allocated twice or not allocated at all. The issues don't seem to be present (usually) in the levels deeper than swarmkit (libnetwork). Instead, libnetwork tends to give garbage responses when given garbage input.

Some examples of problems:

  • A bunch of code for handling multiple allocators, even though only 1 allocator exists and no other allocators are anywhere in the roadmap
  • Responsibilities muddled. All of the network allocation methods are just implemented straight on the Allocator object, even thought presence of a allocActor type in the Run function suggests they should be separate.
  • Code is under-commented and tangled, making parsing it often inscrutable. The tangled nature of the code makes piece-wise refactoring almost impossible, because of the fragile nature of the code.
  • The network allocator keeps a local state from which it makes decisions, which is supposed to be a mirror of the state as committed into raft. However, logic errors can cause this internal state to become irrecoverably inconsistent. If we're lucky, a leadership change fixes this before it becomes a problem. If we're not, bad local state is used to make bad decisions and create inconsistent distributed state. We need to reduce the footprint of the local state as much as feasible.
  • The allocator requires the local state to be initialized from the distributed state before performing any new allocations. However, the logic for allocation and initialization follow the same code paths, and use a boolean flag to separate them. Errors in the initialization logic cause allocations and deallocations to occur before the local state is fully initialized, resulting in duplicate IP addresses.

We should rewrite the whole thing from scratch. It's not a small project, and there's a lot of risk in a rewrite versus a refactoring. However, a clean slate would let us escape the most ingrained design flaws.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions