Use an IdentitySet when testing for elements added/removed from large persisted collections#8
Conversation
… persisted collections.
|
Thanks for the observation however I'm not seeing an improvement using the following test based on an unchanged collection: size := 10.
iterations := 10000.
objects := (1 to: size) collect: [ :each | Object new].
(Time millisecondsToRun: [iterations timesRepeat: [objects reject: [ :each | objects identityIncludes: each ]]])
-> (Time millisecondsToRun: [iterations timesRepeat: [| set | set := objects asIdentitySet. objects reject: [ :each | set identityIncludes: each ]]]).For a collection size of 10 I get runtimes of ~15ms for the linear search versus ~40ms for the identity set. 50 gives ~70ms/170ms and 100 gives ~160ms/340ms. Larger sizes also show the linear search to be faster. Note this is in Dolphin. In Pharo the identity set approach becomes quicker for me at a collection size of around 40 and shows continued improvements at larger sizes. Let me know if I'm missing something in this test or if your observations are different. It's probably worth mentioning that I'm running Dolphin on Windows ARM in VMWare so possibly the conversion from x86 is causing a difference. |
|
What you're missing, which I didn't even realize until you accidentally pointed it out, is that large persisted collections won't generally be |
|
Good spot and an interesting observation. The primitive used by As a further observation, rather counter-intuitively its usually quicker in Dolphin to convert a large collection to an Array then use |
Optimise collection change detection. See rko281/ReStore#8.
|
Good thinking—I didn't remember that the primitive included the necessary start/end arguments to work for OrderedCollection. Of note, in my testing, the IdentitySet does still become faster than the Dolphin primitive at around size 1000, but this is much less relevant in practice, and quite a dramatic difference from without the primitive! |
At one point I noticed that dirty-checking large collections was taking a substantial amount of time during commit due to O(N^2) behavior, especially when the collection is unmodified. For large collections we can use an IdentitySet to speed this up. For small ones linear search is actually faster—I chose 10 as an arbitrary but empirically-reasonable cutover point.