Skip to content

Use an IdentitySet when testing for elements added/removed from large persisted collections#8

Closed
daniels220 wants to merge 1 commit into
rko281:mainfrom
shoshanatech:large-collection-comparison
Closed

Use an IdentitySet when testing for elements added/removed from large persisted collections#8
daniels220 wants to merge 1 commit into
rko281:mainfrom
shoshanatech:large-collection-comparison

Conversation

@daniels220
Copy link
Copy Markdown
Contributor

At one point I noticed that dirty-checking large collections was taking a substantial amount of time during commit due to O(N^2) behavior, especially when the collection is unmodified. For large collections we can use an IdentitySet to speed this up. For small ones linear search is actually faster—I chose 10 as an arbitrary but empirically-reasonable cutover point.

@rko281
Copy link
Copy Markdown
Owner

rko281 commented Feb 6, 2026

Thanks for the observation however I'm not seeing an improvement using the following test based on an unchanged collection:

size := 10.
iterations := 10000.
objects := (1 to: size) collect: [ :each | Object new].

(Time millisecondsToRun: [iterations timesRepeat: [objects reject: [ :each | objects identityIncludes: each ]]])
-> (Time millisecondsToRun: [iterations timesRepeat: [| set | set := objects asIdentitySet. objects reject: [ :each | set identityIncludes: each ]]]).

For a collection size of 10 I get runtimes of ~15ms for the linear search versus ~40ms for the identity set. 50 gives ~70ms/170ms and 100 gives ~160ms/340ms. Larger sizes also show the linear search to be faster.

Note this is in Dolphin. In Pharo the identity set approach becomes quicker for me at a collection size of around 40 and shows continued improvements at larger sizes.

Let me know if I'm missing something in this test or if your observations are different. It's probably worth mentioning that I'm running Dolphin on Windows ARM in VMWare so possibly the conversion from x86 is causing a difference.

@daniels220
Copy link
Copy Markdown
Contributor Author

What you're missing, which I didn't even realize until you accidentally pointed it out, is that large persisted collections won't generally be Arrays. Dolphin has an extremely fast Array>>identityIncludes: primitive, which is skewing the results. Make it (1 to: size) asOrderedCollection collect: [ :each | Object new] and you should see a crossover much sooner—for me it was around size 17 in this particular benchmark, so it could be that 15 or 20 would be a better choice of crossover. My Pharo results are noisier, possibly due to presence or absence of a scavenge, but consistent with a crossover around the same point (15-20)—that might well be accounted for by the architecture difference (I assume your Pharo is running natively on the host OS but still ARM).

rko281 added a commit to rko281/ReStoreForPharo that referenced this pull request Feb 9, 2026
@rko281
Copy link
Copy Markdown
Owner

rko281 commented Feb 9, 2026

Good spot and an interesting observation.

The primitive used by Array>>#identityIncludes: is invoked in Object>>#basicIdentityIndexOf:from:to: and looks like it should also work with OrderedCollection. In fact checking Dolphin 8 the same optimisation as Array has been added to OrderedCollection. As such I've taken the approach of backporting the D8 optimisation to D7 as part of the Dolphin-specific package and applying yout commit (with the threshold increased to 20) to the Pharo-specific package.

As a further observation, rather counter-intuitively its usually quicker in Dolphin to convert a large collection to an Array then use identityIncludes: than to convert to an IdentitySet and use includes:. This is rather moot given OrderedCollections can now be as quick as Arrays but it's certainly an assumption I've previously based optimisations on.

@rko281 rko281 closed this Feb 9, 2026
rko281 added a commit to rko281/ReStoreForPharo that referenced this pull request Feb 9, 2026
Optimise collection change detection. See rko281/ReStore#8.
@daniels220 daniels220 deleted the large-collection-comparison branch February 9, 2026 19:52
@daniels220
Copy link
Copy Markdown
Contributor Author

Good thinking—I didn't remember that the primitive included the necessary start/end arguments to work for OrderedCollection.

Of note, in my testing, the IdentitySet does still become faster than the Dolphin primitive at around size 1000, but this is much less relevant in practice, and quite a dramatic difference from without the primitive!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants