Skip to content

Commit d3c21bb

Browse files
committed
Address review comments and clarify code a bit
1 parent 0f182e2 commit d3c21bb

File tree

2 files changed

+16
-24
lines changed

2 files changed

+16
-24
lines changed

InternalDocs/garbage_collector.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -523,15 +523,15 @@ but doing too much work slows down execution.
523523

524524
To work out how much work we need to do, consider a heap with `L` live objects
525525
and `G0` garbage objects at the start of a full scavenge and `G1` garbage objects
526-
at the end of the scavenge. We don't want amount of garbage to grow, `G1 ≤ G0`, and
526+
at the end of the scavenge. We don't want the amount of garbage to grow, `G1 ≤ G0`, and
527527
we don't want too much garbage (say 1/3 of the heap maximum), `G0 ≤ L/2`.
528528
For each full scavenge we must visit all objects, `T == L + G0 + G1`, during which
529529
`G1` garbage objects are created.
530530

531531
The number of new objects created `N` must be at least the new garbage created, `N ≥ G1`,
532532
assuming that the number of live objects remains roughly constant.
533533
If we set `T == 4*N` we get `T > 4*G1` and `T = L + G0 + G1` => `L + G0 > 3G1`
534-
For a steady state heap `G0 == G1` we get `L > 2G` and the desired garbage ratio.
534+
For a steady state heap (`G0 == G1`) we get `L > 2G0` and the desired garbage ratio.
535535

536536
In other words, to keep the garbage fraction to 1/3 or less we need to visit
537537
4 times as many objects as are newly created.
@@ -544,11 +544,11 @@ Everything in `M` is live, so `I ≥ G0` and in practice `I` is closer to `G0 +
544544

545545
If we choose the amount of work done such that `2*M + I == 6N` then we can do
546546
less work in most cases, but are still guaranteed to keep up.
547-
Since `I G0 + G1` (not strictly true, but close enough)
548-
`T == M + I == (6N + I)/2` and `(6N + I)/2 4G`, so we can keep up.
547+
Since `I G0 + G1` (not strictly true, but close enough)
548+
`T == M + I == (6N + I)/2` and `(6N + I)/2 4G`, so we can keep up.
549549

550550
The reason that this improves performance is that `M` is usually much larger
551-
than `I` Suppose `M == 10I`, then `T ≅ 3N`.
551+
than `I`. If `M == 10I`, then `T ≅ 3N`.
552552

553553
Finally, instead of using a fixed multiple of 8, we gradually increase it as the
554554
heap grows. This avoids wasting work for small heaps and during startup.

Python/gc.c

Lines changed: 11 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1544,40 +1544,32 @@ mark_at_start(PyThreadState *tstate)
15441544
return objects_marked;
15451545
}
15461546

1547+
1548+
/* See InternalDocs/garbage_collector.md for more details. */
1549+
#define MAX_HEAP_PORTION_MULTIPLIER 5
1550+
#define MARKING_PROGRESS_MULTIPLIER 2
1551+
15471552
static intptr_t
15481553
assess_work_to_do(GCState *gcstate)
15491554
{
15501555
/* The amount of work we want to do depends on two things.
15511556
* 1. The number of new objects created
1552-
* 2. The heap size (up to twice the number of new objects, to avoid quadratic effects)
1553-
* 3. The amount of garbage.
1554-
*
1555-
* We cannot know how much of the heap is garbage, but we know that no reachable object
1556-
* is garbage. We make a (fairly pessismistic) assumption that half the heap not
1557-
* reachable from the roots is garbage, and count collections of increments as half as efficient
1558-
* as processing the heap as the marking phase.
1559-
*
1560-
* For a large, steady state heap, the amount of work to do is at least three times the
1561-
* number of new objects added to the heap. This ensures that we stay ahead in the
1562-
* worst case of all new objects being garbage.
1557+
* 2. The heap size (up to a multiple of the number of new objects, to avoid quadratic effects)
15631558
*/
15641559
intptr_t scale_factor = gcstate->old[0].threshold;
15651560
if (scale_factor < 2) {
15661561
scale_factor = 2;
15671562
}
15681563
intptr_t new_objects = gcstate->young.count;
1569-
intptr_t max_heap_fraction = new_objects*5;
1570-
intptr_t heap_fraction = gcstate->heap_size / SCAN_RATE_DIVISOR / scale_factor;
1571-
if (heap_fraction > max_heap_fraction) {
1572-
heap_fraction = max_heap_fraction;
1564+
intptr_t max_heap_portion = new_objects * MAX_HEAP_PORTION_MULTIPLIER;
1565+
intptr_t heap_portion = gcstate->heap_size / SCAN_RATE_DIVISOR / scale_factor;
1566+
if (heap_portion > max_heap_portion) {
1567+
heap_portion = max_heap_portion;
15731568
}
15741569
gcstate->young.count = 0;
1575-
return new_objects + heap_fraction;
1570+
return new_objects + heap_portion;
15761571
}
15771572

1578-
/* See Internal GC docs for explanation */
1579-
#define MARKING_PROGRESS_MULTIPLIER 2
1580-
15811573
static void
15821574
gc_collect_increment(PyThreadState *tstate, struct gc_collection_stats *stats)
15831575
{

0 commit comments

Comments
 (0)