Skip to content

YJIT/ZJIT.enable is not thread/ractor-safe #883

@chancancode

Description

@chancancode

When implementing #854 / ruby#15219, I carried over a pattern that seemed concerning.

The implementation is split across Ruby and native code, Rust in this case. Here is a simplified version:

class RubyVM::YJIT
  def self.enable(stats: ...)
    return false if enabled?

    if Primitive.cexpr! 'RBOOL(rb_zjit_enabled_p)'
      # as an aside, should this actually raise? something really strange is going on in your code if this actually happens
      warn("Only one JIT can be enabled at the same time.")
      return false
    end

    # check args and raise ArgumentError if needed

    at_exit { print_and_dump_stats } if stats

    Primitive.rb_yjit_enable(stats, ...)
  end
end
#[no_mangle]
pub extern "C" fn rb_yjit_enable(...) -> VALUE {
    with_vm_lock(src_loc!(), || {
        // without rechecking the enabled state, we proceed to mutate internal VM state
        // ...

        yjit_init();

        // Add "+YJIT" to RUBY_DESCRIPTION
        extern "C" {
            fn ruby_set_yjit_description();
        }
        unsafe { ruby_set_yjit_description(); }

        Qtrue
    })
}

To my eyes, this seems problematic. By checking for the enabled state (both enabled? itself, and also checking for if the other JIT is active) in the Ruby code, it looks like we have a time-of-check to time-of-use race here, and it is possible to context switch in between, and more than one threads/ractors can end up running the JIT-enable block multiple times resulting in a crash, or enabling YJIT/ZJIT at the same time, when that becomes a possibility.

I believe the only correct way to do the check is to do it in native code after acquiring the lock, but as I looked into this more, I found another potential problem. According to this piece of documentation, we can't call back into ruby while holding the VM lock, but one of the first thing we do in yjit_init() is calling the JIT hooks with rb_funcall().

I'm also not sure if this is the right lock either, as the documentation says it doesn't preclude other ractors from running. We are mutating a bunch of global variables (static mut) in the block, if it doesn't block out other readers, that seems problematic also (e.g. another ractor can try to read yjit_enabled_p while it's being written).

(Aside: right now, you won't succeed in enabling the JIT from the non-main ractor, because the JIT hooks are stored in a non-frozen class instance variable, so that will eventually raise an error. I think it's totally reasonable to restrict enabling the JIT to the main factor, but this seems like relying on a a fragile implementation detail to get that result.)

Happy to work on a fix – and I think I need to resolve this before I can add support for calling the JIT hooks to ZJIT – but I think I'll need some help brainstorming the right approach here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions