EyeMulator — Human-Attention Artifact

Companion artifact for the ACL 2026 paper "EyeMulator: Improving Code Language Models by Mimicking Human Visual Attention" by Yifan Zhang, Chen Huang, Yueke Zhang, Jiahao Zhang, Toby Li, Collin McMillan, Kevin Leach, and Yu Huang.

EyeMulator aligns code language models with human visual attention. Eye-tracking data is distilled into a small set of reusable priors (Beta distributions over semantic token classes, plus n-gram transition counts), pseudo-scan paths are generated from those priors over arbitrary code, and the model is trained with a weighted cross-entropy loss combined with a token-level preference loss. This repository contains the priors themselves, a small demonstration dataset, and a reference PyTorch implementation of the method components.

Repository layout

EyeMulator/
├── README.md
├── LICENSE                         MIT (code) + CC-BY-4.0 attribution (data)
├── CITATION.bib
├── priors/
│   ├── combined/                   distilled from reading + writing sessions
│   ├── reading/                    reading-only sessions
│   └── writing/                    writing-only sessions
├── dataset_sample/                 30 examples per split per task; same schema as a full dataset
│   ├── completion_{train,valid,test}_sample.jsonl
│   ├── summarization_{train,valid,test}_sample.jsonl
│   └── translation_{train,valid,test}_sample.jsonl
├── figures/                        human-side figures from the paper
│   ├── human_study.pdf
│   ├── eyemulator_overview.pdf
│   ├── eyemulator_pseudo_path.pdf
│   ├── combined_beta_distributions.pdf
│   ├── combined_beta_curves.pdf
│   └── category_distribution.pdf
├── docs/
│   ├── data_schema.md              field-by-field format of priors and dataset
│   ├── method_integration.md       how to wire the priors into a training loop
│   └── human_attention_analysis.md distribution analysis of the priors + figure index
└── example/
    ├── analyze_human_attention.py  summarize Beta params and top n-grams from priors
    ├── compute_token_weights.py    load priors and compute per-token weight w_j
    └── weighted_sft_template.py    reference implementation of the method components

Origin of the eye-tracking data

All priors in this release are derived from the EyeTrans corpus collected by Zhang et al., 2024, EyeTrans: Merging Human and Machine Attention for Neural Code Summarization, in studies conducted at the University of Notre Dame under the appropriate IRB protocols. We thank those authors and Notre Dame for making this work possible.

Quick start

git clone https://github.com/CoderDoge1108/EyeMulator.git
cd EyeMulator

python example/compute_token_weights.py \
    --priors priors/combined \
    --jsonl  dataset_sample/completion_train_sample.jsonl \
    --limit  2

This prints two examples with their per-token human-attention weights w_j, using only the Python standard library.

Inspecting the priors

To reproduce the distribution analysis from the paper — posterior salience per semantic label, and the most frequent monogram / bigram / trigram fixation transitions — run:

python example/analyze_human_attention.py --priors priors/combined --top 10

The same script accepts --priors priors/reading or --priors priors/writing, and --plot beta.pdf renders the Beta density curves (requires matplotlib). A walkthrough of what each figure shows, together with the paper's Table 1 reproduced inline, is in docs/human_attention_analysis.md. The original PDF figures are in figures/.

Using the method in a training pipeline

pip install torch transformers

docs/method_integration.md describes how to plug the priors into a training loop. The components in example/weighted_sft_template.py, named after Algorithm 1 in the paper, are:

sample_attention_density — sample ρ ~ Beta(α_agg, β_agg).
generate_pseudo_scan_path — build a pseudo-scan path P̃ from the priors and ρ.
token_weight — the per-token weight w_j = w_base + 1/log(freq(g_j)+2) + E[θ_{s_j}].
CausalLMWithWeightedLoss — weighted causal-LM loss L_SFT.
token_level_preference_loss — token-level preference term against a frozen reference policy.
EyeMulatorCompositeObjective — the composite L_total = L_SFT + γ · L_pref.
WeightedCollator, build_training_example — batching and preprocessing helpers.

The file is backbone-agnostic (swap LlamaForCausalLM for whichever model you use) and does not hard-code our training schedule, so it composes with an existing Trainer, accelerate, or custom loop.

Directions worth trying

Larger backbones (7B / 13B / 70B) on the same three tasks.
Larger training sets, including non-Java code and more CodeXGLUE tasks.
Parameter-efficient variants (LoRA, QLoRA) on top of the weights.
Alternative preference objectives (IPO, KTO, SimPO, token-level DPO variants).

If you try any of these, we'd be glad to hear about it — please open an issue.

Citing

Please cite both the EyeMulator paper and the EyeTrans dataset. BibTeX is in CITATION.bib.

License

Code (example/): MIT License. See LICENSE.
Data and documentation (priors/, dataset_sample/, figures/, docs/): CC-BY-4.0.

The underlying eye-tracking data originates from Zhang et al., EyeTrans (FSE'24); please credit that source as well.

Archival copy

An archival copy of this artifact is deposited on Zenodo for long-term citability: https://zenodo.org/records/16134801.

Contact

For questions or issues, please open a GitHub issue, or contact the corresponding authors at the email addresses on the paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EyeMulator — Human-Attention Artifact

Repository layout

Origin of the eye-tracking data

Quick start

Inspecting the priors

Using the method in a training pipeline

Directions worth trying

Citing

License

Archival copy

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
dataset_sample		dataset_sample
docs		docs
example		example
figures		figures
priors		priors
.gitignore		.gitignore
CITATION.bib		CITATION.bib
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

EyeMulator — Human-Attention Artifact

Repository layout

Origin of the eye-tracking data

Quick start

Inspecting the priors

Using the method in a training pipeline

Directions worth trying

Citing

License

Archival copy

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages