Currently, native module integration tests only verify that models execute without crashing and produce output. However, they don't validate that the output is correct. For example, image generation models return image data, but we don't verify the images are visually correct or match expected results.
The scope of this task is to introduce approval/snapshot testing for model outputs. Tests will compare model outputs against known-good reference files to detect regressions in model exports.
Currently, native module integration tests only verify that models execute without crashing and produce output. However, they don't validate that the output is correct. For example, image generation models return image data, but we don't verify the images are visually correct or match expected results.
The scope of this task is to introduce approval/snapshot testing for model outputs. Tests will compare model outputs against known-good reference files to detect regressions in model exports.