Skip to content

Fix EnableOptionsGuard usage in test_multidevice_lower_communication. #4641

Merged
wujingyue merged 2 commits intomainfrom
wjy/option
Jun 16, 2025
Merged

Fix EnableOptionsGuard usage in test_multidevice_lower_communication. #4641
wujingyue merged 2 commits intomainfrom
wjy/option

Conversation

@wujingyue
Copy link
Collaborator

@wujingyue wujingyue commented Jun 13, 2025

No description provided.

NVFuserTest has it already
@wujingyue
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Jun 13, 2025

Review updated until commit 1f7ef88

Description

  • Removed unnecessary guard in getCommunicationInfo

  • Fixed test by removing EnableOptionsGuard

  • Improved error messages and formatting

  • Updated test instantiation to skip enable_host_ir_lowering


Changes walkthrough 📝

Relevant files
Enhancement
lower_to_communication.cpp
Rename parameter and improve error messages                           

csrc/host_ir/lower_to_communication.cpp

  • Renamed parameter expr to e
  • Updated error messages to use e
  • Replaced NVF_ERROR with NVF_ERROR_EQ for better readability
  • +13/-13 
    fusion_kernel_runtime.cpp
    Rename parameter and improve error messages                           

    csrc/runtime/fusion_kernel_runtime.cpp

  • Renamed parameter expr to e
  • Updated error messages to use e
  • Replaced NVF_ERROR with NVF_ERROR_EQ for better readability
  • +11/-9   
    Bug fix
    allocations.cpp
    Fix spacing in error message                                                         

    csrc/runtime/allocations.cpp

    • Fixed spacing in error message
    +1/-1     
    test_multidevice_lower_communication.cpp
    Remove guard and update test instantiation                             

    tests/cpp/test_multidevice_lower_communication.cpp

  • Removed EnableOptionsGuard from tests
  • Updated test instantiation to skip enable_host_ir_lowering
  • +2/-5     

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review

    Possible Issue

    The PR renames expr to e in multiple places, which might lead to confusion or bugs if there are other variables named e in the scope. Ensure that renaming does not introduce any unintended side effects.

    CommunicationInfo getCommunicationInfo(Expr* e) {
      NVF_ERROR(
          isResharding(e),
          "getCommunicationInfo should only be called when `e` is known to be a "
          "communication. So `e` should be resharding. Given: ",
          e);
    
      NVF_ERROR(
          e->isA<LoadStoreOp>() || e->isA<ReductionOp>(),
          "getCommunicationInfo should only be called when `e` is known to be a "
          "communication. So `e` should be either a LoadStoreOp or a "
          "ReductionOp. Given: ",
          e);
    
      auto* producer = e->inputs().at(0)->as<TensorView>();
    Possible Issue

    The PR renames expr to e in multiple places, which might lead to confusion or bugs if there are other variables named e in the scope. Ensure that renaming does not introduce any unintended side effects.

    auto deviceid = Communicator::getInstance().deviceId();
    NVF_ERROR_EQ(
        group_to_run->exprs().size(),
        1,
        "Communication segments must contain only one Expr.");
    for (auto* e : convertSingleOpToCommunication(
             ir_cloner.clone(group_to_run->exprs().at(0)), deviceid)) {
      NVF_ERROR(
          e->isA<Communication>(),
          "Exprs in a Communication group should be Communication: ",
          e);
      // Allocate the recv buffers of communications
      auto* communication = e->as<Communication>();
      TensorView* tv = communication->out();
      if (tv->getDeviceMesh().has(deviceid)) {
        auto* allocate =
            IrBuilder::create<kir::Allocate>(tv, MemoryType::Global);
        hic->pushBackTopLevelExprs(allocate);
      }
      hic->pushBackTopLevelExprs(communication);
      auto wait = IrBuilder::create<hir::Wait>(communication);
      hic->pushBackTopLevelExprs(wait);
    Missing Test

    The PR removes EnableOptionsGuard in multiple test cases. Ensure that the tests still function correctly without this guard and consider adding new tests if necessary to cover the changes.

    TEST_P(LowerGatherTest, ) {
      const auto& [meshes, enable_host_ir_lowering] = GetParam();
      const auto& [in_mesh, out_mesh] = meshes;
    
      if (enable_host_ir_lowering) {
        EnableOptionsGuard::getCurOptions().set(EnableOption::HostIrLowering);
      }
    
      SKIP_IF_NOT_ENOUGH_DEVICES(in_mesh, out_mesh);
    
      auto fusion = std::make_unique<Fusion>();
      FusionGuard fg(fusion.get());
    
      TensorView* in = makeContigTensor(2);
      TensorView* out = set(in);
      fusion->addInput(in);
      fusion->addOutput(out);
    
      in->setDeviceMesh(in_mesh);
      out->setDeviceMesh(out_mesh);
      in->axis(0)->parallelize(ParallelType::DIDx);
    
      const auto device_id = communicator_->deviceId();
      at::Tensor unsharded_tensor =
          at::randn({in_mesh.size(), kTensorSize}, tensor_options);
      at::Tensor in_tensor = shardTensor(unsharded_tensor, in);
    
      FusionExecutorCache executor_cache(std::move(fusion));
      at::Tensor out_tensor =
          executor_cache.runFusionWithInputs({in_tensor})[0].as<at::Tensor>();
      assertIsCompiledToHostIrContainer(executor_cache);
    
      if (out_mesh.has(device_id)) {
        EXPECT_TRUE(at::equal(out_tensor, unsharded_tensor));
      }
    }
    
    namespace {
    std::string paramToString(
        const testing::TestParamInfo<std::tuple<InOutMesh, bool>>& info) {
      auto&& [meshes, enable_hir] = info.param;
      auto&& [in_mesh, out_mesh] = meshes;
    
      std::stringstream ss;
      ss << "InMesh";
      for (auto id : in_mesh.vector()) {
        ss << "_" << id;
      }
      ss << "_OutMesh";
      for (auto id : out_mesh.vector()) {
        ss << "_" << id;
      }
      ss << (enable_hir ? "_HostIr" : "_NonHostIr");
    
      return ss.str();
    }
    } // namespace
    
    INSTANTIATE_TEST_SUITE_P(
        ,
        LowerGatherTest,
        // Create product of InOutMesh configurations and HostIrLowering options
        testing::Combine(
            testing::ValuesIn(std::vector<InOutMesh>(
                {{{0, 1}, {0}}, {{0, 1}, {1}}, {{1, 2}, {0, 2}}})),
            testing::Bool()),
        paramToString);
    
    class LowerScatterTest
        : public MultiDeviceTest,
          public testing::WithParamInterface<std::tuple<InOutMesh, bool>> {};
    
    TEST_P(LowerScatterTest, ) {

    @wujingyue wujingyue changed the title Remove an unnecessary guard Fix EnableOptionsGuard usage in test_multidevice_lower_communication. Jun 14, 2025
    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Minor renaming

    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Minor renaming

    public testing::WithParamInterface<std::tuple<InOutMesh, bool>> {};

    TEST_P(LowerGatherTest, ) {
    EnableOptionsGuard opt_guard;
    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Not necessary because NVFuserTest owns an EnableOptionsGuard already.

    // available. Therefore, we call it after the isBackendAvailable check.
    communicator_->setDefaultBackend(backend_type);

    EnableOptionsGuard enable_options_guard;
    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    This (apparently introduced in #4170) is what has been hiding #4230.

    @wujingyue wujingyue requested review from Priya2698 and nsarka June 14, 2025 06:06
    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue wujingyue merged commit 9a3317f into main Jun 16, 2025
    54 checks passed
    @wujingyue wujingyue deleted the wjy/option branch June 16, 2025 16:16
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants