https://github.com/gligen/GLIGEN/blob/f9dccb9c6cf48bad03c3666290a7dec8c5e58f3c/gligen_inference.py#L98 https://github.com/gligen/GLIGEN/blob/f9dccb9c6cf48bad03c3666290a7dec8c5e58f3c/gligen_inference.py#L115 The projection_matrix is clip_model.text_projection.weight.data, right? Is it enough to transpose projection matrix once?