Skip to content

Avoid symbolic output extent in newOutputDomain#358

Closed
jacobhinkle wants to merge 12 commits intomainfrom
fix_symbolic_new_domain
Closed

Avoid symbolic output extent in newOutputDomain#358
jacobhinkle wants to merge 12 commits intomainfrom
fix_symbolic_new_domain

Conversation

@jacobhinkle
Copy link
Collaborator

Fixes #357

There are a combination of issues address by this PR. First, as noted in the issue, we should not exact map between Symbolic domains. However, there is still an issue where the output extent, computed at Fusion definition, is sometimes set equal to the symbolic extent even though that extent might concretize to 1 and then become a resolved broadcast. In those cases we should either use any Iteration extents if available, or if we have an unambiguous Symbolic extent, we can use it.

@jacobhinkle
Copy link
Collaborator Author

I've encountered an interesting limitation. For a dynamic resize resulting in a broadcast, we may have a concretized root domain of type Iteration that's resized to a Broadcast rfactor domain. For example, as of now the example in #357 concretizes (properly?) to

Inputs:
  T0_g[ iS0{i0}, iS1{i2} ], float
  T1_g[ iS2{i3}, iS3{i4} ], float
  i5, int64_t
  i6, int64_t
Outputs:
  T3_g[ iS8{i3}, iS9{i4} ], float
  T2_g[ bS10{( i0 + i5 )}rf, iS11{( i2 + i6 )}rf ], float

%kernel_math {
T2_g[ bS10{( i0 + i5 )}rf, iS11{( i2 + i6 )}rf ]
   = pad( T0_g[ iS0{i0}, iS1{i2} ], {0, i5, 0, i6} )
T3_g[ iS8{i3}, iS9{i4} ]
   = T2_g[ bS10{( i0 + i5 )}rf, iS11{( i2 + i6 )}rf ]
   * T1_g[ iS2{i3}, iS3{i4} ];

This currently fails when building ComputeAtMap while building the structure used to define concrete IDs, ConcretizedBroadcastDomains at this check: https://github.com/NVIDIA/Fuser/blob/main/csrc/device_lower/analysis/trivial_broadcast.cpp#L99. I believe this assumes that a broadcast will come from a BroadcastOp, and tries to map every Broadcast root ID to its "original" root ID, meaning the ID in the root of the input to a BroadcastOp. In the case of Resize we don't have a separate TV op representing the broadcast, so to satisfy this assumption we would need to insert an op at concretization to separately broadcast after concretizing all resized IterDomains as Iteration. That's a bit invasive, so I'm checking to see if in this case it might be safe to use the resize itself as the concrete ID.

@jacobhinkle
Copy link
Collaborator Author

Note that reshape does not trigger this error because during concretization/static reshape, broadcasts are detected and actual broadcast ops are inserted. This may be the way to go; I can detect broadcast output domains that come from Resize op while processing a TV and ideally remove them then insert the broadcast on that TV. I'm not exactly sure how best to remove that ID without it being a Broadcast. One option is to Merge that ID with its neighbor assuming it has any. I will continue thinking about this.

@naoyam
Copy link
Collaborator

naoyam commented May 18, 2023

It seems to me that the problem lies in ConcretizedBroadcastDomains. If we extend it to handle broadcast IDs produced by IterDomain expressions, should the problem be fixed?

@jacobhinkle
Copy link
Collaborator Author

If we extend it to handle broadcast IDs produced by IterDomain expressions, should the problem be fixed?

I believe so.

@naoyam
Copy link
Collaborator

naoyam commented May 18, 2023

OK, then I think that would be preferred over adding broadcast ops. Let me know if you need any help.

With this, now failing at [E thread_pool.cpp:109] Exception in thread pool task: it != bcast_map_.end() INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/root_domain_map.cpp":630, please report a bug to PyTorch. Not found: iS4{i0}rf in T2[ iS4{i0}rf iS6{i2}rf ] (Rfactor: [ bS10{( i0 + i5 )}rf iS11{( i2 + i6 )}rf ])
Using this, we can enable mapping of Symbolic IterDomains when doing p2c
propagation in concretization, and still keep it disabled for the
purposes of exact mapping.

This fixes some errors but I am now seeing errors like

> [E thread_pool.cpp:110] Exception in thread pool task: is_overriden || index_map.find(alloc_dom[i]) != index_map.end() INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/index_compute.cpp":1634,
jacobhinkle added a commit that referenced this pull request Jun 2, 2023
@jacobhinkle
Copy link
Collaborator Author

This is being addressed in #420

@jacobhinkle jacobhinkle closed this Jun 2, 2023
@jacobhinkle jacobhinkle deleted the fix_symbolic_new_domain branch July 25, 2023 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Symbolic iter domains should not exact map when aligned

2 participants