Skip to content

Align JEPA supervision with leakage-free future targets#8

Open
mehulsuresh wants to merge 1 commit intoginwind:mainfrom
mehulsuresh:codex/upstream-jepa-two-pass
Open

Align JEPA supervision with leakage-free future targets#8
mehulsuresh wants to merge 1 commit intoginwind:mainfrom
mehulsuresh:codex/upstream-jepa-two-pass

Conversation

@mehulsuresh
Copy link
Copy Markdown

@mehulsuresh mehulsuresh commented Apr 23, 2026

Summary

This PR adds an opt-in JEPA supervision path for lerobot_datasets that constructs future-shifted target clips and encodes them separately from the context clip.

When datasets.vla_data.video_target_shift_steps is 0, behavior is unchanged.

When video_target_shift_steps is set to one encoder tubelet, the dataloader emits:

  • a shorter context clip for the predictor input
  • a future-shifted target clip with the same encoded temporal length for supervision

VLA_JEPA.forward() then encodes those clips separately instead of deriving both context and target states from a single encoded video window.

Paper Alignment

The paper describes VLA-JEPA as "leakage-free state prediction" and says future frames should be used only as supervision targets, not as inputs to the learner (paper, HTML).

This PR moves the implementation toward that setup:

  • the predictor input comes from a context clip
  • the supervision target comes from a separate future-shifted clip
  • the default single-pass path remains unchanged unless video_target_shift_steps is enabled

@mehulsuresh mehulsuresh changed the title Add optional two-pass JEPA target clips Align JEPA supervision with leakage-free future targets Apr 23, 2026
@mehulsuresh mehulsuresh marked this pull request as ready for review April 23, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant