Use an IdentitySet when testing for elements added/removed from large persisted collections by daniels220 · Pull Request #8 · rko281/ReStore

daniels220 · 2026-02-05T21:04:35Z

At one point I noticed that dirty-checking large collections was taking a substantial amount of time during commit due to O(N^2) behavior, especially when the collection is unmodified. For large collections we can use an IdentitySet to speed this up. For small ones linear search is actually faster—I chose 10 as an arbitrary but empirically-reasonable cutover point.

… persisted collections.

rko281 · 2026-02-06T11:21:15Z

Thanks for the observation however I'm not seeing an improvement using the following test based on an unchanged collection:

size := 10.
iterations := 10000.
objects := (1 to: size) collect: [ :each | Object new].

(Time millisecondsToRun: [iterations timesRepeat: [objects reject: [ :each | objects identityIncludes: each ]]])
-> (Time millisecondsToRun: [iterations timesRepeat: [| set | set := objects asIdentitySet. objects reject: [ :each | set identityIncludes: each ]]]).

For a collection size of 10 I get runtimes of ~15ms for the linear search versus ~40ms for the identity set. 50 gives ~70ms/170ms and 100 gives ~160ms/340ms. Larger sizes also show the linear search to be faster.

Note this is in Dolphin. In Pharo the identity set approach becomes quicker for me at a collection size of around 40 and shows continued improvements at larger sizes.

Let me know if I'm missing something in this test or if your observations are different. It's probably worth mentioning that I'm running Dolphin on Windows ARM in VMWare so possibly the conversion from x86 is causing a difference.

daniels220 · 2026-02-06T17:42:39Z

What you're missing, which I didn't even realize until you accidentally pointed it out, is that large persisted collections won't generally be Arrays. Dolphin has an extremely fast Array>>identityIncludes: primitive, which is skewing the results. Make it (1 to: size) asOrderedCollection collect: [ :each | Object new] and you should see a crossover much sooner—for me it was around size 17 in this particular benchmark, so it could be that 15 or 20 would be a better choice of crossover. My Pharo results are noisier, possibly due to presence or absence of a scavenge, but consistent with a crossover around the same point (15-20)—that might well be accounted for by the architecture difference (I assume your Pharo is running natively on the host OS but still ARM).

rko281 · 2026-02-09T10:08:12Z

Good spot and an interesting observation.

The primitive used by Array>>#identityIncludes: is invoked in Object>>#basicIdentityIndexOf:from:to: and looks like it should also work with OrderedCollection. In fact checking Dolphin 8 the same optimisation as Array has been added to OrderedCollection. As such I've taken the approach of backporting the D8 optimisation to D7 as part of the Dolphin-specific package and applying yout commit (with the threshold increased to 20) to the Pharo-specific package.

As a further observation, rather counter-intuitively its usually quicker in Dolphin to convert a large collection to an Array then use identityIncludes: than to convert to an IdentitySet and use includes:. This is rather moot given OrderedCollections can now be as quick as Arrays but it's certainly an assumption I've previously based optimisations on.

Optimise collection change detection. See rko281/ReStore#8.

daniels220 · 2026-02-09T21:32:49Z

Good thinking—I didn't remember that the primitive included the necessary start/end arguments to work for OrderedCollection.

Of note, in my testing, the IdentitySet does still become faster than the Dolphin primitive at around size 1000, but this is much less relevant in practice, and quite a dramatic difference from without the primitive!

Use an IdentitySet when testing for elements added/removed from large…

db373ea

… persisted collections.

rko281 added a commit to rko281/ReStoreForPharo that referenced this pull request Feb 9, 2026

Optimise collection change detection. See rko281/ReStore#8.

2481b36

rko281 added a commit that referenced this pull request Feb 9, 2026

Optimise collection change detection. Contributes to #8.

0a0d779

rko281 closed this Feb 9, 2026

rko281 added a commit to rko281/ReStoreForPharo that referenced this pull request Feb 9, 2026

Merge pull request #33 from rko281/main

46bd7e2

Optimise collection change detection. See rko281/ReStore#8.

daniels220 deleted the large-collection-comparison branch February 9, 2026 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use an IdentitySet when testing for elements added/removed from large persisted collections#8

Use an IdentitySet when testing for elements added/removed from large persisted collections#8
daniels220 wants to merge 1 commit into
rko281:mainfrom
shoshanatech:large-collection-comparison

daniels220 commented Feb 5, 2026

Uh oh!

rko281 commented Feb 6, 2026

Uh oh!

daniels220 commented Feb 6, 2026

Uh oh!

rko281 commented Feb 9, 2026

Uh oh!

daniels220 commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

daniels220 commented Feb 5, 2026

Uh oh!

rko281 commented Feb 6, 2026

Uh oh!

daniels220 commented Feb 6, 2026

Uh oh!

rko281 commented Feb 9, 2026

Uh oh!

daniels220 commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants