Use a weak equality map instead of _id2ref by headius · Pull Request #7 · ruby/drb

headius · 2021-11-18T12:08:52Z

This PR moves away from using _id2ref for object references, instead using a weak-valued Hash with appropriate locking and cleaning logic.

From the primary commit:

The ObjectSpace._id2ref method is considered an internal API of
CRuby and is difficult to implement efficiently on implementations
that do not have direct control over the garbage-collected heap.
On JRuby, for example, arbitrary _id2ref object lookup is normally
disabled, as the implementation requires maintaining a parallel
mapping of object IDs to weak object references, adding a very
large amount of overhead to all object ID uses. As a result, Ruby
standard libraries should try to avoid using _id2ref.

This patch introduces a weak map into DRb for managing object
references without _id2ref. The map is weak-valued, with a clean
subroutine to scrub out defunct entries. Puts initiate a clean,
hopefully ensuring that very few stale entries accumulate.

This patch does not use WeakMap do to limitations of its API.
WeakMap specifies that keys are identity-based, and as of Ruby 3.0
allows immediate values like Symbols and Integers. In order to
support this behavior, those values must be idempotent, which on
JRuby is impossible to support for fixnum-ranged Integers and
flonum-ranged Floats, rendering it useless as an object ID-keyed
cache.

This implementation currently hangs partway through the tests, and I am unsure why. I wanted to get this implementation out there to get help refining it and getting tests passing, since _id2ref is very problematic on JRuby and JRuby would like to work with the released versions of DRb when we release our Ruby 3.0-compatible release later this year.

headius · 2021-11-18T12:11:06Z

cc @hsbt @nobu for help... the logic here has been working for many years on JRuby, albeit without the locking and interrupt handling. I am not sure yet why the specs hang partway through (apparently in test_02_basic_object.

cc @eregon because I assume _id2ref is undesirable on TruffleRuby as well.

headius · 2021-11-18T12:16:18Z

@jeremyevans Review request was suggested by GH... feel free to ignore if you have no interest in this library.

headius · 2021-11-18T13:21:14Z

Those runs were clearly not completing so I will try to figure out why it hangs with my version.

The ObjectSpace._id2ref method is considered an internal API of CRuby and is difficult to implement efficiently on implementations that do not have direct control over the garbage-collected heap. On JRuby, for example, arbitrary _id2ref object lookup is normally disabled, as the implementation requires maintaining a parallel mapping of object IDs to weak object references, adding a very large amount of overhead to all object ID uses. As a result, Ruby standard libraries should try to avoid using _id2ref. This patch introduces a weak map into DRb for managing object references without _id2ref. The map is weak-valued, with a clean subroutine to scrub out defunct entries. Puts initiate a clean, hopefully ensuring that very few stale entries accumulate. This patch does not use WeakMap do to limitations of its API. WeakMap specifies that keys are identity-based, and as of Ruby 3.0 allows immediate values like Symbols and Integers. In order to support this behavior, those values must be idempotent, which on JRuby is impossible to support for fixnum-ranged Integers and flonum-ranged Floats, rendering it useless as an object ID-keyed cache.

headius · 2021-11-18T13:46:00Z

I see now there is a WeakIdConv that implements something similar using WeakMap. That may be acceptable if users realize that fixnum/flonum keys on JRuby have no guarantee of idempotency.

So I thought I would try to run the tests using that as the default idconv but I don't think I'm hooking it up right. Is more than this needed?

diff --git a/test/drb/test_drbobject.rb b/test/drb/test_drbobject.rb
index 2b0e206..57d9783 100644
--- a/test/drb/test_drbobject.rb
+++ b/test/drb/test_drbobject.rb
@@ -36,7 +36,7 @@ class TestDRbObject < Test::Unit::TestCase
   include DRbObjectTest
 
   def setup
-    DRb.start_service
+    DRb.start_service(nil, nil, {:idconv => DRb::WeakIdConv.new})
   end
 end

Any reason we wouldn't want to just make this the default idconv? Maybe only on JRuby and other impls where _id2ref is problematic?

eregon · 2021-11-18T14:20:42Z

cc @seki who is the official maintainer of drb (https://github.com/ruby/ruby/blob/master/doc/maintainers.rdoc)

jeremyevans

I'm OK with the general approach, though I think the implementation needs a few changes. Please see inline comments. Also, I'm not the drb maintainer, so I'll add seki as a reviewer.

jeremyevans · 2021-11-18T15:58:14Z

lib/drb/drb.rb

-        obj.nil? ? nil : obj.__id__
-      when BasicObject
-        obj.__id__
+      obj.nil? ? nil : _put(obj)


This seems wrong. BasicObject#nil? is not defined. You should probably keep the case obj, and just switch the obj.__id__ to _put(obj) in both when branches. I would guess that is what causes the test failure for basic objects.

jeremyevans · 2021-11-18T16:01:37Z

lib/drb/drb.rb

+        begin
+          yield
+        ensure
+          MUTEX.unlock


The idiomatic way here would be to use MUTEX.synchronize(&block). However, maybe there a reason the Thread.handle_interrupt and explicit lock/unlock was done instead. If so, could you add a comment why?

Mostly to avoid the overhead of calling the block in synchronize, since this will be hit hard by anyone using a remote object. Is that worth a comment?

I would switch to MUTEX.synchronize(&block) unless you have a benchmark showing the performance differences are significant enough to warrant this more complex code. If the performance differences are significant, then keeping the code and adding a comment explaining it seems best. My uneducated guess would be that the overhead from having a remote object is much higher than the overhead of a mutex synchronization, even on localhost.

Fair enough. The block form should be slower on CRuby, but it may be noise compared to the weak map and other remote object overhead.

jeremyevans · 2021-11-18T16:05:32Z

lib/drb/drb.rb

+      _safe_lock do
+        weakref = @id2ref[id]
+        if weakref
+          result = weakref.__getobj__ rescue nil


I'm not sure if this handles false values correctly, returning them as nil. I would think you would only want to delete from the hash if __getobj__ raises, and otherwise return the result. Explicit nil values are probably OK, since I'm guessing they result in the same behavior.

eregon · 2021-12-07T19:58:03Z

lib/drb/drb.rb

+    MUTEX = Mutex.new
+
+    def initialize
+      @id2ref = {}


Would ObjectSpace::WeakMap make sense here? That would keep the entries as long as the value is alive, which seems the desired semantic here.

That works only for Ruby 2.7 and higher, since it would require immediates (integer IDs) as keys. That restriction might be ok, since 2.6 did not ship drb as a gem. It would mean this gem is not usable at all on 2.6.

$ rvm ruby-2.6.7 do ruby -e 'ObjectSpace::WeakMap.new[self.__id__] = self' -e:1:in `[]=': cannot define finalizer for Integer (ArgumentError) from -e:1:in `<main>'

This gem specifies >= 2.7 so that seems fine:

drb/drb.gemspec

Line 16 in 82643ca

spec.required_ruby_version = Gem::Requirement.new(">= 2.7.0")

Also, WeakIdConv already uses ObjectSpace::WeakMap.
I think making WeakIdConv the default is probably the safest way.

I would suggest that we should also remove the id2ref version in order to start moving away from having any official code that uses it. The WeakMap version will be more reliable and should work on 2.7+ compatible implementations.

eregon · 2021-12-07T20:14:35Z

So I thought I would try to run the tests using that as the default idconv but I don't think I'm hooking it up right. Is more than this needed?

A quick git grep leads to:

lib/drb/drb.rb:327:# default implementation is provided by DRb::DRbIdConv.  It uses an
lib/drb/drb.rb:359:  class DRbIdConv
lib/drb/drb.rb:1351:    @@idconv = DRbIdConv.new
lib/drb/drb.rb:1380:    # See #new().  The initial default value is a DRbIdConv instance.
lib/drb/drb.rb:1426:    #            to an instance of the class DRb::DRbIdConv.
lib/drb/drb.rb:1870:  # This is expected to be an instance such as DRb::DRbIdConv that responds to
lib/drb/drb.rb:1937:DRbIdConv = DRb::DRbIdConv
lib/drb/gw.rb:34:  class GWIdConv < DRbIdConv
lib/drb/timeridconv.rb:15:  class TimerIdConv < DRbIdConv
lib/drb/weakidconv.rb:11:  class WeakIdConv < DRbIdConv

Line 1351 seems relevant.

headius · 2021-12-07T20:54:09Z

@seki The main outstanding question is whether we should just replace the id2ref implementation with the WeakMap implementation, since that version should be more reliable and id2ref is problematic and effectively deprecated. If this is acceptable, I can modify this PR to use the WeakIdConv and remove the current default id2ref logic.

kou · 2025-04-28T09:33:18Z

I close this in favor of #35.

headius requested a review from jeremyevans November 18, 2021 12:13

headius force-pushed the weakref_not_id2ref branch from 76d8c83 to 2b393cd Compare November 18, 2021 13:47

headius mentioned this pull request Nov 18, 2021

Ruby 3.0 Support jruby/jruby#6878

Closed

jeremyevans suggested changes Nov 18, 2021

View reviewed changes

jeremyevans requested a review from seki November 18, 2021 16:08

eregon reviewed Dec 7, 2021

View reviewed changes

kou closed this Apr 28, 2025

Conversation

headius commented Nov 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

headius commented Nov 18, 2021

Uh oh!

headius commented Nov 18, 2021

Uh oh!

headius commented Nov 18, 2021

Uh oh!

headius commented Nov 18, 2021

Uh oh!

eregon commented Nov 18, 2021

Uh oh!

jeremyevans left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eregon commented Dec 7, 2021

Uh oh!

headius commented Dec 7, 2021

Uh oh!

kou commented Apr 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

headius commented Nov 18, 2021 •

edited

Loading