I notice that a projection_matrix is used to project the image embedding into the text embedding space. However, in the implementations of diffusers, such projection operation is done by "CLIPImageProjection", which is in diffusers.pipelines.stable_diffusion.clip_image_project_model. Are these two practices equivalent?
