Companion artifact for the ACL 2026 paper "EyeMulator: Improving Code Language Models by Mimicking Human Visual Attention" by Yifan Zhang, Chen Huang, Yueke Zhang, Jiahao Zhang, Toby Li, Collin McMillan, Kevin Leach, and Yu Huang.
EyeMulator aligns code language models with human visual attention. Eye-tracking data is distilled into a small set of reusable priors (Beta distributions over semantic token classes, plus n-gram transition counts), pseudo-scan paths are generated from those priors over arbitrary code, and the model is trained with a weighted cross-entropy loss combined with a token-level preference loss. This repository contains the priors themselves, a small demonstration dataset, and a reference PyTorch implementation of the method components.
EyeMulator/
├── README.md
├── LICENSE MIT (code) + CC-BY-4.0 attribution (data)
├── CITATION.bib
├── priors/
│ ├── combined/ distilled from reading + writing sessions
│ ├── reading/ reading-only sessions
│ └── writing/ writing-only sessions
├── dataset_sample/ 30 examples per split per task; same schema as a full dataset
│ ├── completion_{train,valid,test}_sample.jsonl
│ ├── summarization_{train,valid,test}_sample.jsonl
│ └── translation_{train,valid,test}_sample.jsonl
├── figures/ human-side figures from the paper
│ ├── human_study.pdf
│ ├── eyemulator_overview.pdf
│ ├── eyemulator_pseudo_path.pdf
│ ├── combined_beta_distributions.pdf
│ ├── combined_beta_curves.pdf
│ └── category_distribution.pdf
├── docs/
│ ├── data_schema.md field-by-field format of priors and dataset
│ ├── method_integration.md how to wire the priors into a training loop
│ └── human_attention_analysis.md distribution analysis of the priors + figure index
└── example/
├── analyze_human_attention.py summarize Beta params and top n-grams from priors
├── compute_token_weights.py load priors and compute per-token weight w_j
└── weighted_sft_template.py reference implementation of the method components
All priors in this release are derived from the EyeTrans corpus collected by Zhang et al., 2024, EyeTrans: Merging Human and Machine Attention for Neural Code Summarization, in studies conducted at the University of Notre Dame under the appropriate IRB protocols. We thank those authors and Notre Dame for making this work possible.
git clone https://github.com/CoderDoge1108/EyeMulator.git
cd EyeMulator
python example/compute_token_weights.py \
--priors priors/combined \
--jsonl dataset_sample/completion_train_sample.jsonl \
--limit 2This prints two examples with their per-token human-attention weights w_j, using only the Python standard library.
To reproduce the distribution analysis from the paper — posterior salience per semantic label, and the most frequent monogram / bigram / trigram fixation transitions — run:
python example/analyze_human_attention.py --priors priors/combined --top 10The same script accepts --priors priors/reading or --priors priors/writing, and --plot beta.pdf renders the Beta density curves (requires matplotlib). A walkthrough of what each figure shows, together with the paper's Table 1 reproduced inline, is in docs/human_attention_analysis.md. The original PDF figures are in figures/.
pip install torch transformersdocs/method_integration.md describes how to plug the priors into a training loop. The components in example/weighted_sft_template.py, named after Algorithm 1 in the paper, are:
sample_attention_density— sampleρ ~ Beta(α_agg, β_agg).generate_pseudo_scan_path— build a pseudo-scan pathP̃from the priors andρ.token_weight— the per-token weightw_j = w_base + 1/log(freq(g_j)+2) + E[θ_{s_j}].CausalLMWithWeightedLoss— weighted causal-LM lossL_SFT.token_level_preference_loss— token-level preference term against a frozen reference policy.EyeMulatorCompositeObjective— the compositeL_total = L_SFT + γ · L_pref.WeightedCollator,build_training_example— batching and preprocessing helpers.
The file is backbone-agnostic (swap LlamaForCausalLM for whichever model you use) and does not hard-code our training schedule, so it composes with an existing Trainer, accelerate, or custom loop.
- Larger backbones (7B / 13B / 70B) on the same three tasks.
- Larger training sets, including non-Java code and more CodeXGLUE tasks.
- Parameter-efficient variants (LoRA, QLoRA) on top of the weights.
- Alternative preference objectives (IPO, KTO, SimPO, token-level DPO variants).
If you try any of these, we'd be glad to hear about it — please open an issue.
Please cite both the EyeMulator paper and the EyeTrans dataset. BibTeX is in CITATION.bib.
- Code (
example/): MIT License. SeeLICENSE. - Data and documentation (
priors/,dataset_sample/,figures/,docs/): CC-BY-4.0.
The underlying eye-tracking data originates from Zhang et al., EyeTrans (FSE'24); please credit that source as well.
An archival copy of this artifact is deposited on Zenodo for long-term citability: https://zenodo.org/records/16134801.
For questions or issues, please open a GitHub issue, or contact the corresponding authors at the email addresses on the paper.