Skip to content

[Bug][graph_dumper] HLO HTML dump segfaults after gemm_fusion_autotuner due to stale fusion state / null operands #35786

@QZero233

Description

@QZero233

Summary

When dumping HLO graphs to HTML immediately after the gemm_fusion_autotuner pass, HloDotDumper can segfault. The crash happens because IsFused() returns true for instructions whose parent computation is still marked as fusion (kFusion), even though its FusionInstruction() back‑pointer has been cleared. In addition, some kGetTupleElement nodes may have had their operands removed, but the dumper still dereferences operand(0).

Steps to Reproduce

  1. Download t5_hlo.txt
  2. Run the following command:
XLA_FLAGS="\
--xla_dump_hlo_snapshots \
--xla_dump_hlo_module_re=.* \
--xla_dump_to=./run_dumps \
--xla_dump_hlo_as_text \
--xla_dump_hlo_as_html \
--xla_dump_hlo_pass_re=.*" \
run_hlo_module \
  --input_format=hlo \
  --platform=CUDA \
  ./t5_hlo.txt

Affected Area

  • xla/service/hlo_graph_dumper.cc
  • xla/service/gpu/autotuning/gemm_fusion_autotuner.cc
  • xla/hlo/ir/hlo_instruction.cc (IsFused())

Expected Behavior

Dumping HLO graphs should not crash, even if there are transient or inconsistent fusion states during/after rewrites.

Actual Behavior

Segmentation fault while rendering HTML (or DOT) after gemm_fusion_autotuner.

Additional Context

Dump is triggered in HloPassPipeline immediately after pass execution, so it observes intermediate states.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions