Similar to a previous issue (#31311) we had with vision models. cc @merveenoyan @amyeroberts @NielsRogge
Offending lines:
|
pos_embeds = pos_embeds.reshape(1, int(math.sqrt(num_positions)), int(math.sqrt(num_positions)), dim) |
|
scale_factor=(h0 / math.sqrt(num_positions), w0 / math.sqrt(num_positions)), |
We should probably abstract interpolate_pos_encoding, since this is reused across many different vision architectures.
Similar to a previous issue (#31311) we had with vision models. cc @merveenoyan @amyeroberts @NielsRogge
Offending lines:
transformers/src/transformers/models/hiera/modeling_hiera.py
Line 340 in 5c1027b
transformers/src/transformers/models/hiera/modeling_hiera.py
Line 344 in 5c1027b
We should probably abstract
interpolate_pos_encoding, since this is reused across many different vision architectures.