Skip to content
This repository was archived by the owner on Oct 30, 2021. It is now read-only.
This repository was archived by the owner on Oct 30, 2021. It is now read-only.

Speed up sorting / reduce overhead of sorting #120

@springmeyer

Description

@springmeyer

About a year ago @KaiBot3000, @aarthykc, and me put carmen under load, profiled with perf, and noticed a single, significant bottleneck in carmen-cache.

The bottleneck was the sorting of Context objects, specifically contextSortByRelev. This makes sense given Context objects are large (copying them is expensive), there may be a lot of them collected in memory, and the sort function is not simple.

After #93 we should see a slight speedup / reduction in the cost of this sorting because Context objects are now moved rather than copied. However I would not be surprised if the top bottleneck in carmen is still this sorting in carmen-cache. So this ticket stands to:

  • remind us that this is worthwhile of more investigation
  • document what we saw in case optimization work around sorting is picked up again
  • reference Ensure Context is noncopyable #93 which includes some ideas for optimizations not yet attempted, including using std::partial_sort

Details:

  • The sort function:
    inline bool contextSortByRelev(Context const& a, Context const& b) noexcept {
    if (b.relev > a.relev)
    return false;
    else if (b.relev < a.relev)
    return true;
    else if (b.coverList[0].scoredist > a.coverList[0].scoredist)
    return false;
    else if (b.coverList[0].scoredist < a.coverList[0].scoredist)
    return true;
    else if (b.coverList[0].idx < a.coverList[0].idx)
    return false;
    else if (b.coverList[0].idx > a.coverList[0].idx)
    return true;
    return (b.coverList[0].id > a.coverList[0].id);
    }
  • The place the sort takes place: (
    std::sort(contexts.begin(), contexts.end(), contextSortByRelev);
    ) was identified as a bottleneck during profiling of carmen with perf. This is no surprise, sorting is expensive
  • The perf output that previously highlighted sorting as the primary bottleneck:

0707a524-f469-11e6-9296-302ac929396b

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions