Skip to content

GeneralPurposeAllocator.searchBucket: check current bucket before searching the list#17389

Merged
andrewrk merged 1 commit intoziglang:masterfrom
squeek502:gpa-search-cur-bucket
Oct 4, 2023
Merged

GeneralPurposeAllocator.searchBucket: check current bucket before searching the list#17389
andrewrk merged 1 commit intoziglang:masterfrom
squeek502:gpa-search-cur-bucket

Conversation

@squeek502
Copy link
Member

@squeek502 squeek502 commented Oct 4, 2023

Follow up to #17383. This is a minor optimization that only matters when a small allocation is resized/free'd soon after it is allocated.

The only real difference I was able to observe with this was via a synthetic benchmark that allocates a full bucket and then frees all but one of the slots, over and over in a loop (see "Benchmark code" in #17383):

Debug build:

Benchmark 1 (9 runs): gpa-degen-master.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           575ms ± 5.19ms     569ms …  583ms          0 ( 0%)        0%
  peak_rss           43.8MB ± 1.37KB    43.8MB … 43.8MB          1 (11%)        0%
Benchmark 2 (10 runs): gpa-degen-search-cur.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           532ms ± 5.55ms     520ms …  539ms          0 ( 0%)        ⚡-  7.5% ±  0.9%
  peak_rss           43.8MB ± 65.2KB    43.8MB … 44.0MB          1 (10%)          +  0.0% ±  0.1%

ReleaseFast build:

Benchmark 1 (129 runs): gpa-degen-master-release.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          38.9ms ± 1.12ms    36.7ms … 42.4ms          8 ( 6%)        0%
  peak_rss           23.2MB ± 2.39KB    23.2MB … 23.2MB          0 ( 0%)        0%
Benchmark 2 (151 runs): gpa-degen-search-cur-release.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          33.2ms ±  999us    31.9ms … 36.3ms         20 (13%)        ⚡- 14.7% ±  0.6%
  peak_rss           23.2MB ± 2.26KB    23.2MB … 23.2MB          0 ( 0%)          +  0.0% ±  0.0%

No difference in Debug mode standard library tests:

Benchmark 1 (10 runs): std-tests-master.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          16.1s  ±  224ms    16.0s  … 16.7s           1 (10%)        0%
  peak_rss           42.3MB ± 4.65KB    42.3MB … 42.3MB          1 (10%)        0%
Benchmark 2 (10 runs): std-tests-search-cur.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          16.0s  ± 43.8ms    16.0s  … 16.1s           0 ( 0%)          -  0.4% ±  0.9%
  peak_rss           42.3MB ±  181KB    42.3MB … 42.8MB          1 (10%)          +  0.1% ±  0.3%

No noticeable difference in the degraded arocc case from #17383 (this is on Windows with stack trace collection turned off):

Benchmark 1: testaro-suffix.bat master
  Time (mean ± σ):      6.658 s ±  0.054 s    [User: 5.243 s, System: 1.407 s]
  Range (min … max):    6.606 s …  6.746 s    5 runs

Benchmark 2: testaro.bat
  Time (mean ± σ):      6.609 s ±  0.055 s    [User: 5.124 s, System: 1.451 s]
  Range (min … max):    6.540 s …  6.684 s    5 runs

Summary
  'testaro.bat' ran
    1.01 ± 0.01 times faster than 'testaro-suffix.bat master'

…rching the list

Follow up to ziglang#17383. This is a minor optimization that only matters when a small allocation is resized/free'd soon after it is allocated.

The only real difference I was able to observe with this was via a synthetic benchmark that allocates a full bucket and then frees all but one of the slots, over and over in a loop:

Debug build:

Benchmark 1 (9 runs): gpa-degen-master.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           575ms ± 5.19ms     569ms …  583ms          0 ( 0%)        0%
  peak_rss           43.8MB ± 1.37KB    43.8MB … 43.8MB          1 (11%)        0%
Benchmark 2 (10 runs): gpa-degen-search-cur.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           532ms ± 5.55ms     520ms …  539ms          0 ( 0%)        ⚡-  7.5% ±  0.9%
  peak_rss           43.8MB ± 65.2KB    43.8MB … 44.0MB          1 (10%)          +  0.0% ±  0.1%

ReleaseFast build:

Benchmark 1 (129 runs): gpa-degen-master-release.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          38.9ms ± 1.12ms    36.7ms … 42.4ms          8 ( 6%)        0%
  peak_rss           23.2MB ± 2.39KB    23.2MB … 23.2MB          0 ( 0%)        0%
Benchmark 2 (151 runs): gpa-degen-search-cur-release.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          33.2ms ±  999us    31.9ms … 36.3ms         20 (13%)        ⚡- 14.7% ±  0.6%
  peak_rss           23.2MB ± 2.26KB    23.2MB … 23.2MB          0 ( 0%)          +  0.0% ±  0.0%
@andrewrk andrewrk enabled auto-merge (rebase) October 4, 2023 02:59
@andrewrk andrewrk merged commit ec0f76c into ziglang:master Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants