Skip to content

Commit 4fbfaca

Browse files
committed
wip
1 parent 1d1ffdc commit 4fbfaca

File tree

6 files changed

+72
-11
lines changed

6 files changed

+72
-11
lines changed

_pages/about.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
---
22
permalink: /about/
33
title: "About"
4+
layout: home
45
---
56

67
Hi! :wave: My name is Luca Gioffré (**[lˈuka d͡ʒoffrˈe]**).
@@ -21,9 +22,8 @@ This is where I share my academic work, personal projects and anything else I fi
2122
## Timeline
2223
- `2023-2026` (expected) -- **PhD Student** at [SapienzaNLP](https://nlp.uniroma1.it/), [Sapienza University](https://www.uniroma1.it/en/pagina-strutturale/home), Rome 🇮🇹
2324
- **Project**: Narrative Understanding and Interpretability of LLMs -- **Supervisor**: Prof. [Roberto Navigli](https://www.diag.uniroma1.it/navigli/)
24-
- `2025` -- **Teaching**: TA for the Master Course [Multilingual Natural Language Processing 2025](https://naviglinlp.blogspot.com/2025/) held at Sapienza
25+
- `2024-2026` -- **Teaching**: TA for the Master Course [Multilingual Natural Language Processing](https://naviglinlp.blogspot.com/2025/) held at Sapienza
2526
- `2024` -- **Summer School**: [LxMLS 2024](https://bgmartins.github.io/lxmls-website-2024/index.html) 🇵🇹 ([post]({% link _posts/2024-07-11-LxMLS.md %}))
26-
- `2024` -- **Teaching**: TA for the Master Course [Multilingual Natural Language Processing 2024](https://naviglinlp.blogspot.com/2024/) held at Sapienza
2727
- `2019-2023` -- **MSc in Engineering in Computer Science**, [Sapienza University](https://www.uniroma1.it/en/pagina-strutturale/home), Rome 🇮🇹
2828
- **Master Thesis**: "*Structured Information Representation for Long-Document Summarization*", supervised by Prof. [Roberto Navigli](https://www.diag.uniroma1.it/navigli/) and [Fabrizio Silvestri](https://sites.google.com/diag.uniroma1.it/fabriziosilvestri)
2929
- `2020` -- **Erasmus** in [Örebro Universitet](https://www.oru.se/english/), Örebro (**\[œrɛˈbruː\]**) 🇸🇪

_pages/publications.md

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,32 +7,41 @@ title: "Publications"
77
## Preprints
88
Can't spoiler them yet! :eyes:
99

10+
## 2026
11+
12+
- <u>Luca Gioffré</u>\*, Luca Moroni, Alberte Fernández-Castro, Elena Marafatto, Giacomo Garufi, and Roberto Navigli. 2026. **INDAQA2 - A Large Italian Narrative QA Benchmark: A CALAMITA 2026 Challenge.** In *Proceedings of the 9th evaluation campaign EVALITA 2026*, pages xx-xx, Bari, Italy. CEUR Workshop Proceedings.<br>
13+
[![Conference](https://img.shields.io/badge/Workshop-EVALITA 2026-forestgreen)](https://www.evalita.it/campaigns/evalita-2026/)
14+
[![anthology](https://img.shields.io/badge/Paper-CEUR--anthology-008080)](https://apa.dipsco.unitn.it/evalita2026/69.pdf)
15+
[![Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-INDAQA2-FCD21D)](https://huggingface.co/datasets/sapienzanlp/INDAQA_CALAMITA)
16+
[![GitHub](https://img.shields.io/badge/GitHub-Code-blue)](https://github.com/Andrew-Wyn/INDAQA_CALAMITA)
17+
1018
## 2025
1119

12-
- Luca Moroni\*, Tommaso Bonomo, <u>Luca Gioffré</u>, Lu Xu, Domenico Fedele, Leonardo Colosi, Andrei Stefan Bejgu, Alessandro Scirè and Roberto Navigli. 2025. **What we Learned from Continually Training Minerva: a Case Study on Italian.** In *Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2025)*, pages xxx–xxx, Cagliari, Italy. CEUR Workshop Proceedings.<br>
13-
[![CLiC-it](https://img.shields.io/badge/Conference-CLiC--it 2025-forestgreen)](https://clic2025.unica.it/Vol-XXXX/71_main_long.pdf)
14-
[![INDAQA HuggingFace Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-INDAQA-FCD21D)](https://huggingface.co/datasets/sapienzanlp/indaqa)
15-
[![ITALIC-Gen HuggingFace Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-ITALIC--Gen-FCD21D)](https://huggingface.co/datasets/sapienzanlp/ITALIC-gen)<details>
20+
- Luca Moroni\*, Tommaso Bonomo, <u>Luca Gioffré</u>, Lu Xu, Domenico Fedele, Leonardo Colosi, Andrei Stefan Bejgu, Alessandro Scirè and Roberto Navigli. 2025. **What we Learned from Continually Training Minerva: a Case Study on Italian.** In *Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2025)*, pages 760–774, Cagliari, Italy. CEUR Workshop Proceedings.<br>
21+
[![Conference](https://img.shields.io/badge/Conference-CLiC--it 2025-forestgreen)](https://clic2025.unica.it/Vol-XXXX/71_main_long.pdf)
22+
[![anthology](https://img.shields.io/badge/Paper-ACL--anthology-008080)](https://aclanthology.org/2025.clicit-1.72/)
23+
[![Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-INDAQA-FCD21D)](https://huggingface.co/datasets/sapienzanlp/indaqa)
24+
[![Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-ITALIC--Gen-FCD21D)](https://huggingface.co/datasets/sapienzanlp/ITALIC-gen) <details>
1625
*We explore continual pretraining strategies to improve Italian-language performance using Minerva by testing different data mixtures (mathematical, encyclopedic, and narrative) and extended context windows.*
1726
*We introduce INDAQA, a new Italian narrative QA benchmark, and find that both data composition and longer context significantly enhance performance on Italian tasks.*
1827
*We also convert the [ITALIC](https://aclanthology.org/2025.naacl-long.68/) benchmark from MC to OE format to disentangle whether models struggle with format adherence or with recalling cultural knowledge.*</details>
1928

20-
21-
- Tommaso Bonomo\*, <u>Luca Gioffré</u>\*, and Roberto Navigli. 2025. **<span style="font-variant:small-caps;">LiteraryQA</span>: Towards Effective Evaluation of Long-document Narrative QA** In *Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing*, pages xxx–xxx, Suzhou, China. Association for Computational Linguistics.<br>
29+
- Tommaso Bonomo\*, <u>Luca Gioffré</u>\*, and Roberto Navigli. 2025. **<span style="font-variant:small-caps;">LiteraryQA</span>: Towards Effective Evaluation of Long-document Narrative QA** In *Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing*, pages 34086–34107, Suzhou, China. Association for Computational Linguistics.<br>
2230
[![Conference](http://img.shields.io/badge/Conference-EMNLP 2025-4b44ce.svg)](https://2025.aclweb.org/)
31+
[![anthology](https://img.shields.io/badge/Paper-ACL--anthology-008080)](https://aclanthology.org/2025.emnlp-main.1729/)
2332
[![arXiv](https://img.shields.io/badge/Paper-arXiv-b31b1b.svg)](https://arxiv.org/abs/2510.13494)
2433
[![LiteraryQA HuggingFace Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-LiteraryQA-FCD21D)](https://huggingface.co/datasets/sapienzanlp/LiteraryQA)
2534
[![GitHub](https://img.shields.io/badge/GitHub-Code-blue)](https://github.com/sapienzanlp/LiteraryQA)
26-
[![post](https://img.shields.io/badge/Blog-Post-green)]({% link _posts/2025-08-22-LQA.md %}) <details>
35+
[![post](https://img.shields.io/badge/Blog-Post-green)]({% link _posts/2025-05-22-LQA.md %}) <details>
2736
*We introduce LiteraryQA, a high-quality subset of [NarrativeQA](https://aclanthology.org/Q18-1023/) addressing the benchmark's reliability issues through systematic cleaning of documents and validation of question-answer pairs.*
2837
*Our meta-evaluation reveals that traditional n-gram metrics poorly correlate with human judgment, while LLM-based evaluation, even using smaller open-weight models, achieves strong agreement with human rankings.*
2938
*We provide benchmark results for state-of-the-art long-context LLMs and establish best practices for evaluating narrative question answering systems.*</details>
3039

3140

3241
- Francesco Maria Molfese, Luca Moroni, <u>Luca Gioffré</u>, Alessandro Scirè, Simone Conia, and Roberto Navigli. 2025. **Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering.** In *Findings of the Association for Computational Linguistics: ACL 2025*, pages 18477–18494, Vienna, Austria. Association for Computational Linguistics.<br>
3342
[![Conference](https://img.shields.io/badge/Conference-ACL 2025-red)](https://2025.aclweb.org/)
34-
[![arXiv](https://img.shields.io/badge/Paper-arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.14996)
3543
[![anthology](https://img.shields.io/badge/Paper-ACL--anthology-008080)](https://aclanthology.org/2025.findings-acl.950/)
44+
[![arXiv](https://img.shields.io/badge/Paper-arXiv-b31b1b.svg)](https://arxiv.org/abs/2503.14996)
3645
[![MMLU-Adversarial HuggingFace Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-MMLU--Adversarial-FCD21D)](https://huggingface.co/datasets/sapienzanlp/MMLU-Adversarial)
3746
[![GitHub](https://img.shields.io/badge/GitHub-Code-blue)](https://github.com/Andrew-Wyn/metaQAeval)
3847
[![post](https://img.shields.io/badge/Blog-Post-green)]({% link _posts/2025-03-19-RAWS.md %}) <details>
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ tags:
1818

1919
## 22/08/2025 - Paper accepted at EMNLP 2025!
2020

21-
Our paper, **<span style="font-variant:small-caps;">LiteraryQA</span>: Towards Effective Evaluation of Long-document Narrative QA**, has been accepted to the [EMNLP Main Conference 2025]()!
21+
Our paper, **<span style="font-variant:small-caps;">LiteraryQA</span>: Towards Effective Evaluation of Long-document Narrative QA**, has been accepted to the [EMNLP Main Conference 2025](https://2025.aclweb.org/)!
2222

2323
👏 Huge thanks to my co-authors Tommaso Bonomo and Roberto Navigli.
2424

_posts/2025-06-20-INDAQA.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
---
2+
title: "Continually Training Minerva @CLiC-it 2025"
3+
excerpt_separator: "<!--more-->"
4+
categories:
5+
- Publications
6+
tags:
7+
- Pretraining
8+
- Evaluation
9+
- LLMs
10+
- Long-Context
11+
- Narrative
12+
---
13+
📄 Read the full paper on ACL Proceedings or on arXiv!
14+
<!--more-->
15+
16+
[![Conference](https://img.shields.io/badge/Conference-CLiC--it 2025-forestgreen)](https://clic2025.unica.it/Vol-XXXX/71_main_long.pdf)
17+
[![Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-INDAQA-FCD21D)](https://huggingface.co/datasets/sapienzanlp/indaqa)
18+
[![Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-ITALIC--Gen-FCD21D)](https://huggingface.co/datasets/sapienzanlp/ITALIC-gen)
19+
20+
## 22/08/2025 - Paper accepted at CLiC-it 2025!
21+
22+
Our paper, **What we Learned from Continually Training Minerva: a Case Study on Italian**, has been accepted to the [CLiC-it Conference 2025](https://clic2025.unica.it/)!
23+
24+
👏 Huge thanks to my co-authors Luca Moroni\*, Tommaso Bonomo, Lu Xu, Domenico Fedele, Leonardo Colosi, Andrei Stefan Bejgu, Alessandro Scirè and Roberto Navigli.
25+
26+
See you in [Cagliari](https://www.openstreetmap.org/relation/39837)! 🇮🇹
27+
<center>
28+
<img src="../../assets/images/logo_ClicIt_2025.png" alt="CLiC-it 2025 Logo"/>
29+
</center>
30+
31+
32+
## What We Learned from Continually Training Minerva: Insights for Italian LLM Development
33+
Training large language models for less-represented languages presents unique challenges.
34+
In this work, we investigated how different data recipes and context length extensions affect Italian LLM performance.
35+
36+
We used [Minerva-7B](https://huggingface.co/sapienzanlp/Minerva-7B-base-v1.0), a fully open-source bilingual model, pretrained on 50% Italian and 50% English content, to test three data recipes during continual pretraining: mathematical, encyclopedic, and copyrighted literary content from both Italian and English. We also explored extending the model's context window to handle longer documents.
37+
38+
To evaluate long-context understanding, we created **INDAQA**, the <u>I</u>talian <u>N</u>arrative <u>Da</u>taset for <u>Q</u>uestion-<u>A</u>nswering, the first narrative long-context benchmark for Italian.
39+
40+
**Our Key Findings**:
41+
1. *Context Extension Beats Brute Force*:
42+
Extending Minerva's context window to handle chapter- or book-length texts achieved state-of-the-art performance on long Italian documents. Our models outperformed both Italian-adapted models fine-tuned from English foundations and models trained on many more trillion tokens.
43+
The takeaway: strategic continual pretraining on well-designed Italian data can compete with—and surpass—the brute-force approach of adapting massive English-centric models.
44+
2. *Multiple-Choice Tests Mislead on Cultural Knowledge*
45+
When testing cultural knowledge using multiple-choice questions, results were misleading—models could score well through pattern matching without genuine understanding.
46+
But with open-ended question answering, where models generate free-form responses, Minerva excelled and surpassed all competitors. For fair evaluation of language-specific capabilities, we need formats that truly test comprehension and generation.
47+
48+
We contribute INDAQA to the community and demonstrate the importance of evaluation format when assessing language-specific models.
49+
50+
---
51+
*[LLM]: Large Language Model
52+
*[OE]: Open-ended, also known as _free-form_

assets/images/logo-EVALITA.png

20.5 KB
Loading

assets/images/logo_ClicIt_2025.png

556 KB
Loading

0 commit comments

Comments
 (0)