Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions docs/ai_assessment/ai_assessment_catalogue.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
icon: material/robot-search-outline
title: AI Assessment Catalogue
hide:
- toc
---

<script>
// Hide sidebar. This script is only executed in the catalogue page.
document.addEventListener('DOMContentLoaded', function () {
const sidebar = document.querySelector('.md-sidebar--secondary');
if (document.querySelector('.catalog-header') && sidebar) {
sidebar.style.display = 'none';
sidebar.style.width = '0';
sidebar.style.padding = '0';
sidebar.style.margin = '0';
}
});
</script>

<div class="catalog-header" markdown>
<div markdown>

The AI Assessment Catalogue showcases the evaluation tools, testing frameworks, and assessment solutions available across the Citcom.ai TEF network.
It is regularly updated as new methodologies and tools become available at each TEF site.
If you would like to request an assessment or learn more about a tool, please contact the relevant TEF sites.

<!-- Search input -->
<div class="search-container">
<div class="search-wrapper">
<label class="md-search__icon md-icon" for="searchInput">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M9.5 3A6.5 6.5 0 0 1 16 9.5c0 1.61-.59 3.09-1.56 4.23l.27.27h.79l5 5-1.5 1.5-5-5v-.79l-.27-.27A6.516 6.516 0 0 1 9.5 16 6.5 6.5 0 0 1 3 9.5 6.5 6.5 0 0 1 9.5 3m0 2C7 5 5 7 5 9.5S7 14 9.5 14 14 12 14 9.5 12 5 9.5 5Z"/></svg>
</label>
<input type="text" id="searchInput" placeholder="Search assessment solutions..." />
</div>
<button id="toggleFilters">
<span class="filter-icon">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M14 12v7.88c.04.3-.06.62-.29.83a.996.996 0 0 1-1.41 0l-2.01-2.01a.989.989 0 0 1-.29-.83V12h-.03L4.21 4.62a1 1 0 0 1 .17-1.4c.19-.14.4-.22.62-.22h14c.22 0 .43.08.62.22a1 1 0 0 1 .17 1.4L14.03 12H14Z"/></svg>
</span>
<span class="check-icon">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M9 16.17 4.83 12l-1.42 1.41L9 19 21 7l-1.41-1.41L9 16.17z"/></svg>
</span>
</button>
</div>

<style>

/* Make ALL columns narrow by default */
.md-typeset table:not(.no-format) th,
.md-typeset table:not(.no-format) td {
width: 60px;
}

/* Make the Resources column (9th) small */
.md-typeset table:not(.no-format) th:nth-child(9),
.md-typeset table:not(.no-format) td:nth-child(9) {
width: 100px;
}

/* Make the last column (10th: Example of Use Case) wide */
.md-typeset table:not(.no-format) th:nth-child(10),
.md-typeset table:not(.no-format) td:nth-child(10) {
width: 500px;
}

</style>


| Solution Name | Provider | Licensing Type | Project Phase / TRL | Domain of Application | AI Risk Category | Ethical Dimensions | Security & Securitization of Data | Resources | Example of Use Case |
|---------------|----------|----------------|----------------------|------------------------|------------------|--------------------|-----------------------------------|-----------|----------------------|
| **FAIRGAME** | LIST | Open-source | TRL 6–8 | LLM bias testing, AI agents behavioural testing, jailbreaking testing | General Purpose AI | Fairness, Robustness | Depends on the use case (whether the chatbot/AI agent has access to sensitive data) | GitHub: <https://github.com/aira-list/FAIRGAME>, Paper: "FAIRGAME: A Framework for AI Agents Bias Recognition Using Game Theory", Frontiers in AI and Applications, Vol. 413: ECAI 2025| A city aims to test its citizen-facing chatbot before launch. FAIRGAME enables the creation of simulated users with diverse identities, personalities, and requests using LLMs, allowing evaluation in dynamic, real-world-like conversations. |
| **MLA-BiTe** | LIST | To be open sourced | TRL 6–8 | LLM bias testing | General Purpose AI | Fairness, Robustness | No data privacy requirements | — | A city plans to evaluate fairness in its citizen-facing chatbot. MLA-BiTe allows non-technical staff to create local scenario-based prompts to uncover discriminatory behaviour across sensitive categories, supporting multiple languages and augmentations. |
| **Legal KG-RAG** | LIST | Proprietary | TRL 5–7 | LLM factuality accuracy testing | General Purpose AI | Transparency, Explainability, Robustness | Depends on whether the RAG is performed on sensitive data | — | A city using a standard RAG pipeline obtains irrelevant results. Legal KG-RAG rebuilds the legal corpus as a Neo4j knowledge graph, enabling direct comparison between traditional and KG-enhanced retrieval. |
| **MLA-Reject** | LIST | To be open sourced | TRL 6–8 | LLM robustness to jailbreaking | General Purpose AI | Robustness | Depends on whether the system has access to sensitive data | — | A public administration operates a multilingual assistant for internal queries. They want to test robustness against unsafe or misleading prompts. MLA-Reject generates difficult negative prompts to test refusal behaviour and safety guardrails, revealing weaknesses and improving configurations. |
52 changes: 52 additions & 0 deletions docs/ai_assessment/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Citcom Label

The Citcom Label is an initiative currently under development within Citcom.ai. Its goal is to create a trusted, recognisable signal that helps AI providers demonstrate responsible practices and gives buyers—especially public-sector actors such as smart cities—a clearer basis for evaluating and procuring AI solutions.


## What will the Citcom Label be?

The label is envisioned as a **system of digital badges**, each representing a specific dimension of trustworthiness assessed during the evaluation process.
These badges would include a **watermark**, ensuring authenticity and preventing misuse. Each badge would be **verifiable through the Citcom Hub**, allowing external stakeholders to confirm its origin, evaluation status, and associated criteria.

The Citcom badges are **not intended to function as legally binding conformity certificates under the AI Act**. Instead, they serve as **smart-city–oriented quality marks**, helping cities and other public authorities gain confidence in the AI solutions they consider adopting.

For AI innovators, the Citcom badge system provides **independent third-party validation**, helping them promote their solutions and demonstrate that they meet recognised standards of trustworthiness. For cities and public buyers, the badges offer **clear, evidence-based guidance** to support more informed and transparent procurement decisions.

## On what basis will the Citcom badges be awarded?

The detailed criteria are still being developed with Citcom partners, but several guiding principles are emerging:

### Completion of an evaluation
A badge is expected to be awarded only once a solution completes a structured assessment aligned with shared guidelines for the relevant dimension of trustworthiness.

### Common methodology
Work is ongoing to define a coherent framework that determines how systems are qualified, how requirements translate into test cases, and how results are interpreted across different trust dimensions.

### Success thresholds
Initial discussions point toward setting minimum quantitative and qualitative thresholds that vary by product type, maturity level, and the specific dimension being assessed.

### Real-world validation
Evaluations are expected to rely on practical or pilot scenarios using the actual product, ensuring that results reflect real-world behaviour.


## Who will conduct the assessment and with which methodologies?

The assessment behind each Citcom badge will be carried out by the participating TEF sites. Each site brings its own specialised methodologies, tools, and testing infrastructures, reflecting the diversity of technical expertise across the Citcom network.

These assessment solutions cover different dimensions of trustworthiness and can be consulted through the **AI Assessment Catalogue**, available at the following link:

[AI Assessment Catalogue](ai_assessment_catalogue.md)

The catalogue provides an overview of the available evaluation tools, test suites, and methodologies, enabling innovators to understand which capabilities are applied to their systems and helping cities see how specific trust dimensions are assessed.

### Can an AI provider receive assessments across multiple TEF sites?

Yes. If a solution would benefit from complementary expertise available across several TEF sites, an AI provider can undergo assessments in multiple locations. In such cases, the **first-contact TEF site** will coordinate the overall process.

The coordinating TEF site will:
- connect with the additional TEF sites that carry out their assessments independently,
- ensure that each participating site manages its own contractual and operational responsibilities,
- consolidate the evaluation results into a unified report,
- and oversee the issuance of the Citcom badges corresponding to the dimensions assessed across all sites.

This ensures a seamless experience for AI innovators while leveraging the full breadth of expertise across the TEF network.
48 changes: 0 additions & 48 deletions docs/ai_assessment_catalog/index.md

This file was deleted.