diff --git a/README.md b/README.md
index 1342353..db1907a 100644
--- a/README.md
+++ b/README.md
@@ -1,56 +1,114 @@
 # CodeVibing
 
-A visual gallery of AI-generated React components and experiments. Share and explore creative coding with AI assistance.
+CodeVibing is a hybrid workspace that pairs a visual gallery of AI-generated React components with a research-grade Latin bibliography toolkit. The project combines a shareable Next.js playground for creative coding with a Python pipeline for constructing a master catalogue of Latin works (1450–1900).
 
-## Features
+```mermaid
+flowchart TD
+    subgraph Frontend Gallery
+        A[Next.js App Router]
+        B[Shared UI Components]
+        C[Data Seeds]
+        A --> B
+        A --> C
+    end
 
-- 🎨 Visual gallery of AI-generated projects
-- 💻 Live React playground
-- 🌟 Easy project sharing
-- 📱 Responsive design
-- 🎥 Auto-generated previews
+    subgraph Latin Corpus Toolkit
+        R[Raw Catalogue CSVs]
+        N[Normalization Utilities]
+        M[Master Bibliography Builder]
+        T[Translation Matcher]
+        P[Priority Scorer]
+        O[latin_master_1450_1900.csv]
+        R --> N --> M --> T --> P --> O
+    end
 
-## Getting Started
+    B -->|Showcase| Gallery[Live Gallery Experience]
+    O -->|Insights| Gallery
+```
 
-1. Clone the repository:
-   ```bash
-   git clone https://github.com/JDerekLomas/codevibing.git
-   cd codevibing
-   ```
+## Repository Structure
+
+```
+codevibing/
+├── src/                  # Next.js application source
+├── public/               # Static assets for the gallery
+├── latin_corpus/         # Python toolkit for the Latin master bibliography
+├── notebooks/            # Prototyping notebooks for dataset exploration
+├── package.json          # Frontend dependencies
+└── requirements.txt      # Python dependencies for the toolkit
+```
 
-2. Install dependencies:
+## Frontend Quick Start
+
+1. **Install dependencies**
    ```bash
    npm install
    ```
 
-3. Copy .env.example to .env.local and add your credentials:
+2. **Configure environment variables**
    ```bash
    cp .env.example .env.local
+   # Edit .env.local and add any required API keys
    ```
 
-4. Start the development server:
+3. **Run the development server**
    ```bash
    npm run dev
    ```
 
-Visit [http://localhost:3000](http://localhost:3000) to see the app running.
+   Visit <http://localhost:3000> to explore the gallery.
 
-## Project Structure
+## Latin Corpus Toolkit Overview
 
+The toolkit in `latin_corpus/` assembles catalogue exports, flags digitization and translation coverage, and scores works for follow-up research.
+
+### Prerequisites
+
+```bash
+cd latin_corpus
+python -m venv .venv
+source .venv/bin/activate   # On Windows: .venv\Scripts\Activate.ps1
+pip install -r requirements.txt
 ```
-codevibing/
-├── src/
-│   ├── app/          # Next.js app directory
-│   ├── components/   # Shared components
-│   ├── lib/         # Utilities and shared code
-│   └── data/        # Initial seed data
-└── public/          # Static assets
-```
+
+### Workflow
+
+1. Place catalogue exports (USTC, VD16/17/18, ESTC, etc.) and translation series CSVs in `latin_corpus/data/raw/`.
+2. Run the end-to-end builder:
+   ```bash
+   python -m latin_corpus.main
+   ```
+3. Inspect the generated master table at `latin_corpus/data/processed/latin_master_1450_1900.csv`.
+
+See [latin_corpus/README.md](latin_corpus/README.md) for detailed customization options, column mappings, and troubleshooting tips.
+
+## Publishing Your Own Copy to GitHub
+
+If you started from a local folder and want to push it to a new GitHub repository, follow these steps:
+
+1. Create an empty repository at <https://github.com/new>.
+2. Run the following commands from your project directory (replace the URL with your repo):
+   ```bash
+   git init
+   git remote add origin https://github.com/<your-username>/codevibing.git
+   git add .
+   git commit -m "Initial commit"
+   git branch -M main
+   git push -u origin main
+   ```
+3. Verify the remote:
+   ```bash
+   git remote -v
+   ```
+4. Clone elsewhere when needed:
+   ```bash
+   git clone https://github.com/<your-username>/codevibing.git
+   ```
 
 ## Contributing
 
-We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
+We welcome improvements! Please read [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines.
 
 ## License
 
-This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
\ No newline at end of file
+This project is licensed under the MIT License. See [LICENSE](LICENSE) for details.
diff --git a/latin_corpus/README.md b/latin_corpus/README.md
new file mode 100644
index 0000000..bb63d91
--- /dev/null
+++ b/latin_corpus/README.md
@@ -0,0 +1,81 @@
+# Latin Corpus Toolkit
+
+This toolkit assembles disparate catalogue exports into a unified master bibliography of Latin works published between roughly 1450 and 1900. It normalizes metadata, flags digitized editions and modern translations, and assigns a configurable research priority score.
+
+## Pipeline at a Glance
+
+```mermaid
+flowchart LR
+    R[Raw Catalogue CSVs\nUSTC / VD16-18 / ESTC / etc.] --> N[normalize.py\nAuthor & title cleanup]
+    N --> M[merge.py\nBuild master bibliography]
+    M --> T[translation_match.py\nMatch modern translations]
+    T --> P[priority.py\nScore & tag works]
+    P --> O[data/processed/latin_master_1450_1900.csv]
+```
+
+Each stage uses pandas DataFrames and can be customized through configuration dictionaries and helper functions.
+
+## Directory Layout
+
+```
+latin_corpus/
+├── data/
+│   ├── raw/         # Drop catalogue & translation CSV/TSV exports here
+│   └── processed/   # Generated outputs (e.g., latin_master_1450_1900.csv)
+├── latin_corpus/    # Python package with the normalization/merge pipeline
+├── notebooks/       # Optional Jupyter notebooks for exploration
+└── requirements.txt # Toolkit-specific dependencies
+```
+
+## Quick Start
+
+1. **Create a virtual environment and install dependencies**
+   ```bash
+   cd latin_corpus
+   python -m venv .venv
+   source .venv/bin/activate      # On Windows: .venv\Scripts\Activate.ps1
+   pip install -r requirements.txt
+   ```
+
+2. **Stage your source data**
+   * Copy catalogue exports (USTC, VD16/VD17/VD18, ESTC, national catalogues, etc.) into `data/raw/`.
+   * Add translation spreadsheets (Loeb, I Tatti, Brill, or custom lists) to the same folder.
+
+3. **Run the end-to-end build**
+   ```bash
+   python -m latin_corpus.main
+   ```
+   The script prints progress summaries and writes `data/processed/latin_master_1450_1900.csv`.
+
+## Configuring Inputs
+
+* **Column mappings:** The loader functions in `io_utils.py` accept optional dictionaries for renaming columns when catalogue exports use different headings.
+* **Language filtering:** `merge.py` includes a `LANGUAGE_ALLOWED` configuration block—add or remove variants as needed (e.g., `"lat"`, `"Latin"`).
+* **Translation files:** Adjust the `TRANSLATION_SERIES` list near the top of `latin_corpus/main.py` if your filenames differ or you want to add additional translation datasets.
+* **Fuzzy matching:** `translation_match.py` exposes `MATCHING_CONFIG` for enabling/disabling fuzzy title similarity and tuning thresholds.
+
+## Inspecting Results
+
+You can explore the master bibliography interactively using the provided notebook:
+
+```bash
+jupyter notebook notebooks/build_master_example.ipynb
+```
+
+Within the notebook, import and call:
+
+```python
+from latin_corpus.merge import build_master_bibliography
+master_df = build_master_bibliography()
+master_df.head()
+```
+
+## Troubleshooting
+
+* Install pandas and related dependencies if you see a `MissingDependencyError` from `_compat.py`.
+* Verify filenames and encodings for any CSV/TSV that fails to load; the loaders accept both UTF-8 and Latin-1.
+* Delete or move old outputs in `data/processed/` if you want to regenerate the master CSV from scratch.
+
+## Contributing
+
+Pull requests and issue reports are welcome. Please follow the repository-wide [CONTRIBUTING.md](../CONTRIBUTING.md) guidelines when proposing changes.
diff --git a/latin_corpus/data/processed/.gitkeep b/latin_corpus/data/processed/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/latin_corpus/data/raw/.gitkeep b/latin_corpus/data/raw/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/latin_corpus/latin_corpus/__init__.py b/latin_corpus/latin_corpus/__init__.py
new file mode 100644
index 0000000..54c53bf
--- /dev/null
+++ b/latin_corpus/latin_corpus/__init__.py
@@ -0,0 +1,32 @@
+"""Utility package for constructing a Latin bibliography master table."""
+
+from __future__ import annotations
+
+from importlib import import_module
+from typing import Any
+
+__all__ = [
+    "add_priority_scores",
+    "add_translation_flags",
+    "build_master_bibliography",
+    "build_translation_index",
+    "run_pipeline",
+]
+
+
+_MODULE_MAP = {
+    "run_pipeline": (".main", "run_pipeline"),
+    "build_master_bibliography": (".merge", "build_master_bibliography"),
+    "add_priority_scores": (".priority", "add_priority_scores"),
+    "add_translation_flags": (".translation_match", "add_translation_flags"),
+    "build_translation_index": (".translation_match", "build_translation_index"),
+}
+
+
+def __getattr__(name: str) -> Any:  # pragma: no cover - dynamic import glue
+    try:
+        module_name, attr = _MODULE_MAP[name]
+    except KeyError as exc:
+        raise AttributeError(f"module 'latin_corpus' has no attribute {name!r}") from exc
+    module = import_module(module_name, package=__name__)
+    return getattr(module, attr)
diff --git a/latin_corpus/latin_corpus/_compat.py b/latin_corpus/latin_corpus/_compat.py
new file mode 100644
index 0000000..ce23449
--- /dev/null
+++ b/latin_corpus/latin_corpus/_compat.py
@@ -0,0 +1,33 @@
+"""Compatibility helpers for optional runtime dependencies."""
+
+from __future__ import annotations
+
+from importlib import import_module
+from types import ModuleType
+
+
+class MissingDependencyError(RuntimeError):
+    """Raised when a required optional dependency is unavailable."""
+
+
+def require_pandas() -> ModuleType:
+    """Return the :mod:`pandas` module or raise a helpful error message.
+
+    The toolkit leans heavily on pandas for all tabular operations. When the
+    dependency is not installed, importing modules that rely on pandas results
+    in an opaque ``ModuleNotFoundError``. Centralising the import behind this
+    helper lets us surface an actionable instruction for users instead.
+    """
+
+    try:
+        return import_module("pandas")
+    except ModuleNotFoundError as exc:  # pragma: no cover - import-time guard
+        raise MissingDependencyError(
+            "pandas is required for the latin_corpus toolkit. Install the "
+            "dependencies via 'pip install -r requirements.txt' before running "
+            "the pipeline."
+        ) from exc
+
+
+__all__ = ["MissingDependencyError", "require_pandas"]
+
diff --git a/latin_corpus/latin_corpus/io_utils.py b/latin_corpus/latin_corpus/io_utils.py
new file mode 100644
index 0000000..4597fbb
--- /dev/null
+++ b/latin_corpus/latin_corpus/io_utils.py
@@ -0,0 +1,236 @@
+"""Input/output utilities for catalogue and translation data.
+
+This module centralises reading and writing logic so that the rest of the
+pipeline can rely on consistent DataFrame schemas. All file interactions are
+confined to the ``latin_corpus/data`` directory tree by default, but custom
+paths may be supplied when integrating additional catalogues.
+"""
+
+from __future__ import annotations
+
+import logging
+from pathlib import Path
+from typing import Iterable, Mapping, MutableMapping, Optional
+
+from ._compat import require_pandas
+
+pd = require_pandas()
+
+LOGGER = logging.getLogger(__name__)
+
+PACKAGE_ROOT = Path(__file__).resolve().parents[1]
+RAW_DATA_DIR = PACKAGE_ROOT / "data" / "raw"
+PROCESSED_DATA_DIR = PACKAGE_ROOT / "data" / "processed"
+
+# Default file names act as placeholders that users can replace with their own
+# exports. The files do not need to exist; missing files yield empty frames so
+# that the rest of the pipeline can still run for testing purposes.
+DEFAULT_FILENAMES: Mapping[str, str] = {
+    "USTC": "ustc_export.csv",
+    "VD16": "vd16_export.csv",
+    "VD17": "vd17_export.csv",
+    "VD18": "vd18_export.csv",
+    "ESTC": "estc_export.csv",
+}
+
+# Columns that are commonly requested downstream. Missing columns are filled
+# with ``pd.NA`` so that DataFrame operations remain well defined.
+CORE_CATALOG_COLUMNS: tuple[str, ...] = (
+    "source_id",
+    "author",
+    "title",
+    "full_title",
+    "imprint_place",
+    "imprint_year",
+    "language",
+    "subjects",
+    "digital_facsimile_urls",
+)
+
+TRANSLATION_COLUMNS: tuple[str, ...] = (
+    "latin_author",
+    "latin_title",
+    "modern_language",
+    "translation_series",
+    "year_of_translation",
+)
+
+DEFAULT_READ_KWARGS: Mapping[str, object] = {
+    "dtype": str,
+    "keep_default_na": False,
+    "na_values": ["", "NA", "N/A", "null", "None"],
+}
+
+
+def _resolve_path(path: Optional[Path | str], default_name: str) -> Path:
+    """Return the resolved path to a raw data file.
+
+    Parameters
+    ----------
+    path:
+        Custom path provided by the caller. May be absolute or relative.
+    default_name:
+        File name (without directory) to use when ``path`` is ``None``.
+
+    Returns
+    -------
+    Path
+        Fully resolved file path. The file does not need to exist yet.
+    """
+
+    if path is None:
+        candidate = RAW_DATA_DIR / default_name
+    else:
+        candidate = Path(path)
+        if not candidate.is_absolute():
+            candidate = RAW_DATA_DIR / candidate
+    return candidate
+
+
+def _ensure_columns(frame: pd.DataFrame, required: Iterable[str]) -> pd.DataFrame:
+    """Guarantee that ``frame`` has the specified ``required`` columns."""
+
+    missing = [col for col in required if col not in frame.columns]
+    if missing:
+        frame = frame.assign(**{col: pd.NA for col in missing})
+    return frame
+
+
+def _load_csv(path: Path, *, column_map: Optional[Mapping[str, str]] = None, **kwargs) -> pd.DataFrame:
+    """Load a CSV/TSV file into a DataFrame with optional column renaming.
+
+    If the file does not exist, an empty DataFrame with mapped columns is
+    returned instead of raising an exception. This behaviour allows the
+    pipeline to run in environments where only a subset of catalogues are
+    available.
+    """
+
+    read_kwargs: MutableMapping[str, object] = dict(DEFAULT_READ_KWARGS)
+    read_kwargs.update(kwargs)
+
+    if path.suffix.lower() == ".tsv":
+        read_kwargs.setdefault("sep", "\t")
+
+    if not path.exists():
+        LOGGER.warning("File not found: %s", path)
+        frame = pd.DataFrame()
+    else:
+        frame = pd.read_csv(path, **read_kwargs)
+        LOGGER.info("Loaded %s with %s rows and %s columns", path, len(frame), len(frame.columns))
+
+    if column_map:
+        frame = frame.rename(columns=column_map)
+
+    return frame
+
+
+def load_ustc(path: Optional[Path | str] = None, *, column_map: Optional[Mapping[str, str]] = None, **kwargs) -> pd.DataFrame:
+    """Load a Universal Short Title Catalogue (USTC) export.
+
+    Parameters
+    ----------
+    path:
+        Optional custom path. Defaults to ``data/raw/ustc_export.csv``.
+    column_map:
+        Mapping from source column names to canonical names. Columns listed in
+        :data:`CORE_CATALOG_COLUMNS` should be covered.
+    **kwargs:
+        Additional arguments forwarded to :func:`pandas.read_csv`.
+
+    Returns
+    -------
+    pandas.DataFrame
+        DataFrame with at least the columns defined in
+        :data:`CORE_CATALOG_COLUMNS`. Missing fields are populated with ``pd.NA``.
+    """
+
+    resolved = _resolve_path(path, DEFAULT_FILENAMES["USTC"])
+    frame = _load_csv(resolved, column_map=column_map, **kwargs)
+    frame = _ensure_columns(frame, CORE_CATALOG_COLUMNS)
+    return frame
+
+
+def load_vd(path: Optional[Path | str] = None, *, catalog_name: str, column_map: Optional[Mapping[str, str]] = None, **kwargs) -> pd.DataFrame:
+    """Load a VD catalogue (VD16/VD17/VD18) export.
+
+    Parameters
+    ----------
+    path:
+        Optional custom file path relative to ``data/raw``. If omitted, a
+        placeholder derived from ``catalog_name`` is used (e.g. ``vd16_export.csv``).
+    catalog_name:
+        Name of the catalogue, used to construct default file names and
+        populate metadata fields downstream.
+    column_map:
+        Optional rename mapping, analogous to :func:`load_ustc`.
+    **kwargs:
+        Additional keyword arguments for :func:`pandas.read_csv`.
+    """
+
+    default_name = DEFAULT_FILENAMES.get(catalog_name.upper(), f"{catalog_name.lower()}_export.csv")
+    resolved = _resolve_path(path, default_name)
+    frame = _load_csv(resolved, column_map=column_map, **kwargs)
+    frame = _ensure_columns(frame, CORE_CATALOG_COLUMNS)
+    return frame
+
+
+def load_translation_list(path: Optional[Path | str] = None, *, series_name: str, column_map: Optional[Mapping[str, str]] = None, **kwargs) -> pd.DataFrame:
+    """Load a CSV containing information about modern translations.
+
+    Parameters
+    ----------
+    path:
+        Optional custom path relative to ``data/raw``. If omitted, the file name
+        defaults to ``{series_name}.csv`` in snake case (e.g. ``loeb_classical_library.csv``).
+    series_name:
+        Label describing the translation series (e.g. "Loeb"). This value is not
+        used during loading but is convenient when building indices downstream.
+    column_map:
+        Optional rename mapping for column normalisation.
+    **kwargs:
+        Additional keyword arguments for :func:`pandas.read_csv`.
+    """
+
+    default_name = f"{series_name.lower().replace(' ', '_')}.csv"
+    resolved = _resolve_path(path, default_name)
+    frame = _load_csv(resolved, column_map=column_map, **kwargs)
+    frame = _ensure_columns(frame, TRANSLATION_COLUMNS)
+    return frame
+
+
+def save_processed(df: pd.DataFrame, filename: str, *, index: bool = False, **kwargs) -> Path:
+    """Write ``df`` to ``data/processed`` with ``filename``.
+
+    Parameters
+    ----------
+    df:
+        DataFrame to be persisted.
+    filename:
+        File name (with extension) relative to ``data/processed``.
+    index:
+        Whether to include the DataFrame index. Defaults to ``False``.
+    **kwargs:
+        Additional arguments forwarded to :meth:`pandas.DataFrame.to_csv`.
+
+    Returns
+    -------
+    Path
+        The path of the saved file, allowing the caller to log or reuse it.
+    """
+
+    PROCESSED_DATA_DIR.mkdir(parents=True, exist_ok=True)
+    path = PROCESSED_DATA_DIR / filename
+    df.to_csv(path, index=index, **kwargs)
+    LOGGER.info("Saved processed data to %s", path)
+    return path
+
+
+__all__ = [
+    "DEFAULT_FILENAMES",
+    "RAW_DATA_DIR",
+    "PROCESSED_DATA_DIR",
+    "load_translation_list",
+    "load_ustc",
+    "load_vd",
+    "save_processed",
+]
diff --git a/latin_corpus/latin_corpus/main.py b/latin_corpus/latin_corpus/main.py
new file mode 100644
index 0000000..2210668
--- /dev/null
+++ b/latin_corpus/latin_corpus/main.py
@@ -0,0 +1,110 @@
+"""Command-line entry point for building the Latin master dataset."""
+
+from __future__ import annotations
+
+import logging
+from pathlib import Path
+from typing import Iterable, Mapping, MutableMapping, TYPE_CHECKING
+
+from ._compat import MissingDependencyError
+
+if TYPE_CHECKING:  # pragma: no cover - import for typing only
+    import pandas as pd
+
+
+LOGGER = logging.getLogger(__name__)
+
+MASTER_OUTPUT_FILENAME = "latin_master_1450_1900.csv"
+
+# Default translation series specifications. Adjust or extend this tuple to suit
+# the catalogues available in ``data/raw``.
+TRANSLATION_SERIES: tuple[Mapping[str, object], ...] = (
+    {"label": "Loeb", "path": "loeb_classical_library.csv"},
+    {"label": "I Tatti", "path": "i_tatti_renaissance_library.csv"},
+    {"label": "Brill", "path": "brill_translations.csv"},
+)
+
+
+def _load_translation_frames(series_specs: Iterable[Mapping[str, object]]) -> Mapping[str, "pd.DataFrame"]:
+    frames: MutableMapping[str, "pd.DataFrame"] = {}
+    for spec in series_specs:
+        label = str(spec.get("label", "")).strip()
+        if not label:
+            LOGGER.warning("Skipping translation series with missing label: %s", spec)
+            continue
+        path = spec.get("path")
+        column_map = spec.get("column_map")
+        read_kwargs = spec.get("read_kwargs", {})
+        from .io_utils import load_translation_list
+
+        frames[label] = load_translation_list(path=path, series_name=label, column_map=column_map, **read_kwargs)
+    return frames
+
+
+def _print_summary(df: "pd.DataFrame", output_path: Path) -> None:
+    total_rows = len(df)
+    LOGGER.info("Saved master table to %s", output_path)
+    LOGGER.info("Total rows: %s", total_rows)
+
+    if total_rows == 0:
+        LOGGER.warning("No data rows available. Check catalogue inputs in data/raw/.")
+        return
+
+    facsimile_series = df["has_digital_facsimile"].fillna(False).astype(bool)
+    translation_series = df["has_modern_translation"].fillna(False).astype(bool)
+
+    percent_unscanned = 100.0 * (1.0 - facsimile_series.mean())
+    percent_untranslated = 100.0 * (1.0 - translation_series.mean())
+
+    LOGGER.info("%% without digital facsimile: %.2f", percent_unscanned)
+    LOGGER.info("%% without modern translation: %.2f", percent_untranslated)
+
+    top_priority = df.sort_values("priority_score", ascending=False).head(20)
+    if top_priority.empty:
+        LOGGER.info("No rows with positive priority scores yet.")
+        return
+
+    display_columns = [
+        "work_id",
+        "author",
+        "title",
+        "imprint_year",
+        "has_digital_facsimile",
+        "has_modern_translation",
+        "priority_score",
+        "priority_tags",
+    ]
+    LOGGER.info("Top 20 priority works:\n%s", top_priority[display_columns].to_string(index=False))
+
+
+def run_pipeline() -> "pd.DataFrame":
+    """Execute the full pipeline and return the enriched master DataFrame."""
+
+    from .io_utils import save_processed
+    from .merge import build_master_bibliography
+    from .priority import add_priority_scores
+    from .translation_match import DEFAULT_MATCH_CONFIG, add_translation_flags, build_translation_index
+
+    master = build_master_bibliography()
+    translation_frames = _load_translation_frames(TRANSLATION_SERIES)
+    translation_index = build_translation_index(translation_frames)
+    with_translations = add_translation_flags(master, translation_index, config=DEFAULT_MATCH_CONFIG)
+    scored = add_priority_scores(with_translations)
+    output_path = save_processed(scored, MASTER_OUTPUT_FILENAME)
+    _print_summary(scored, output_path)
+    return scored
+
+
+def main() -> None:
+    """Entry point used by ``python -m latin_corpus.main``."""
+
+    logging.basicConfig(level=logging.INFO, format="%(levelname)s:%(name)s:%(message)s")
+    try:
+        run_pipeline()
+    except MissingDependencyError as exc:
+        LOGGER.error("%s", exc)
+        raise SystemExit(1) from exc
+
+
+if __name__ == "__main__":  # pragma: no cover - CLI entry point
+    main()
diff --git a/latin_corpus/latin_corpus/merge.py b/latin_corpus/latin_corpus/merge.py
new file mode 100644
index 0000000..ceec5c9
--- /dev/null
+++ b/latin_corpus/latin_corpus/merge.py
@@ -0,0 +1,295 @@
+"""Catalogue normalisation and merging utilities."""
+
+from __future__ import annotations
+
+import hashlib
+import logging
+from dataclasses import dataclass
+from typing import Callable, Dict, Iterable, Mapping, Optional
+
+from ._compat import require_pandas
+
+pd = require_pandas()
+
+from .io_utils import DEFAULT_FILENAMES, load_ustc, load_vd
+from .normalize import extract_year, normalize_author, normalize_title, standardize_language_label
+
+LOGGER = logging.getLogger(__name__)
+
+CatalogLoader = Callable[..., pd.DataFrame]
+
+
+@dataclass
+class CatalogSpec:
+    """Configuration for a source catalogue."""
+
+    loader: CatalogLoader
+    column_map: Mapping[str, str]
+    default_filename: str
+    extra_kwargs: Mapping[str, object] | None = None
+
+
+# Default column mappings are deliberately conservative and should be adjusted to
+# match the exported CSV headers used in your environment.
+CATALOG_SPECS: Dict[str, CatalogSpec] = {
+    "USTC": CatalogSpec(
+        loader=load_ustc,
+        column_map={
+            "ustc_id": "source_id",
+            "author": "author",
+            "short_title": "title",
+            "full_title": "full_title",
+            "imprint_place": "imprint_place",
+            "imprint_year": "imprint_year",
+            "language": "language",
+            "subjects": "subjects",
+            "digital_facsimile_urls": "digital_facsimile_urls",
+        },
+        default_filename=DEFAULT_FILENAMES["USTC"],
+    ),
+    "VD16": CatalogSpec(
+        loader=load_vd,
+        column_map={
+            "vd16": "source_id",
+            "author": "author",
+            "title_short": "title",
+            "title_full": "full_title",
+            "place": "imprint_place",
+            "year": "imprint_year",
+            "language": "language",
+            "keywords": "subjects",
+            "digital_urls": "digital_facsimile_urls",
+        },
+        default_filename=DEFAULT_FILENAMES["VD16"],
+        extra_kwargs={"catalog_name": "VD16"},
+    ),
+    "VD17": CatalogSpec(
+        loader=load_vd,
+        column_map={
+            "vd17": "source_id",
+            "author": "author",
+            "short_title": "title",
+            "full_title": "full_title",
+            "place": "imprint_place",
+            "imprint_year": "imprint_year",
+            "language": "language",
+            "subjects": "subjects",
+            "digital_facsimile": "digital_facsimile_urls",
+        },
+        default_filename=DEFAULT_FILENAMES["VD17"],
+        extra_kwargs={"catalog_name": "VD17"},
+    ),
+    "VD18": CatalogSpec(
+        loader=load_vd,
+        column_map={
+            "vd18": "source_id",
+            "author": "author",
+            "title": "title",
+            "title_full": "full_title",
+            "place": "imprint_place",
+            "year": "imprint_year",
+            "language": "language",
+            "subjects": "subjects",
+            "digital_facsimile": "digital_facsimile_urls",
+        },
+        default_filename=DEFAULT_FILENAMES["VD18"],
+        extra_kwargs={"catalog_name": "VD18"},
+    ),
+}
+
+CATALOG_PRIORITY: tuple[str, ...] = ("USTC", "VD16", "VD17", "VD18", "ESTC")
+
+VALUE_COLUMNS: tuple[str, ...] = (
+    "author",
+    "title",
+    "full_title",
+    "imprint_place",
+    "subjects",
+    "digital_facsimile_urls",
+)
+
+
+def _load_catalogue(name: str, overrides: Optional[Mapping[str, object]] = None) -> pd.DataFrame:
+    """Load and lightly clean a single catalogue."""
+
+    spec = CATALOG_SPECS.get(name)
+    if spec is None:
+        LOGGER.warning("No catalog specification for %s; skipping", name)
+        return pd.DataFrame()
+
+    kwargs = dict(spec.extra_kwargs or {})
+    if overrides:
+        kwargs.update(overrides)
+
+    frame = spec.loader(path=kwargs.pop("path", None), column_map=spec.column_map, **kwargs)
+    if frame.empty:
+        return frame
+
+    frame["source_catalog"] = name
+
+    for col in ("author", "title", "full_title", "imprint_place", "subjects", "digital_facsimile_urls"):
+        if col in frame.columns:
+            frame[col] = frame[col].fillna("").astype(str).str.strip()
+        else:
+            frame[col] = ""
+
+    frame["author_norm"] = frame["author"].apply(normalize_author)
+    frame["title_norm"] = frame["title"].apply(normalize_title)
+    frame["imprint_year"] = frame["imprint_year"].apply(extract_year)
+
+    frame["language_standardized"] = frame["language"].apply(standardize_language_label)
+    lang_text = frame["language"].fillna("").astype(str)
+    mask = frame["language_standardized"].eq("Latin") | lang_text.str.contains("latin", case=False)
+    frame = frame[mask]
+    frame["language"] = frame["language_standardized"].fillna(frame["language"])
+
+    frame["has_digital_facsimile"] = frame["digital_facsimile_urls"].apply(lambda value: bool(str(value).strip()))
+    frame["digital_facsimile_sources"] = frame["has_digital_facsimile"].map({True: name, False: ""})
+
+    return frame[
+        [
+            "source_catalog",
+            "source_id",
+            "author",
+            "author_norm",
+            "title",
+            "title_norm",
+            "full_title",
+            "imprint_place",
+            "imprint_year",
+            "language",
+            "subjects",
+            "digital_facsimile_urls",
+            "has_digital_facsimile",
+            "digital_facsimile_sources",
+        ]
+    ]
+
+
+def _combine_source_ids(values: Iterable[str]) -> str:
+    unique = sorted({v for v in values if v})
+    return ";".join(unique)
+
+
+def _combine_strings(values: Iterable[str]) -> str:
+    unique = sorted({v.strip() for v in values if v and v.strip()})
+    return ";".join(unique)
+
+
+def _generate_work_id(author_norm: str, title_norm: str, imprint_year: Optional[int]) -> str:
+    year_token = str(imprint_year) if imprint_year is not None else "na"
+    digest = hashlib.md5(f"{author_norm}||{title_norm}||{year_token}".encode("utf-8")).hexdigest()
+    return f"wrk_{digest[:12]}"
+
+
+def _deduplicate(master: pd.DataFrame) -> pd.DataFrame:
+    if master.empty:
+        return master
+
+    rank_map = {name: idx for idx, name in enumerate(CATALOG_PRIORITY)}
+    master = master.copy()
+    master["catalog_rank"] = master["source_catalog"].map(rank_map).fillna(len(rank_map)).astype(int)
+    master["data_completeness"] = master[list(VALUE_COLUMNS)].notna().sum(axis=1)
+    master["imprint_year_group"] = master["imprint_year"].fillna(-1).astype(int)
+    master["dedupe_key"] = list(zip(master["author_norm"], master["title_norm"], master["imprint_year_group"]))
+
+    master_sorted = master.sort_values(by=["catalog_rank", "data_completeness"], ascending=[True, False])
+    best = master_sorted.drop_duplicates(subset="dedupe_key", keep="first")
+
+    source_id_map = master.groupby("dedupe_key")["source_id"].apply(_combine_source_ids)
+    digital_url_map = master.groupby("dedupe_key")["digital_facsimile_urls"].apply(_combine_strings)
+    digital_sources_map = master.groupby("dedupe_key")["digital_facsimile_sources"].apply(_combine_strings)
+    has_digital_map = master.groupby("dedupe_key")["has_digital_facsimile"].any()
+
+    best = best.copy()
+    best["source_id"] = best["dedupe_key"].map(source_id_map)
+    best["digital_facsimile_urls"] = best["dedupe_key"].map(digital_url_map).fillna("")
+    best["digital_facsimile_sources"] = best["dedupe_key"].map(digital_sources_map).fillna("")
+    best["has_digital_facsimile"] = best["dedupe_key"].map(has_digital_map).fillna(False)
+
+    best["work_id"] = best.apply(
+        lambda row: _generate_work_id(row["author_norm"], row["title_norm"], row["imprint_year"]), axis=1
+    )
+
+    best = best.drop(columns=["catalog_rank", "data_completeness", "imprint_year_group", "dedupe_key"])
+    best["imprint_year"] = best["imprint_year"].astype("Int64")
+
+    columns = [
+        "work_id",
+        "source_catalog",
+        "source_id",
+        "author",
+        "author_norm",
+        "title",
+        "title_norm",
+        "full_title",
+        "imprint_place",
+        "imprint_year",
+        "language",
+        "subjects",
+        "digital_facsimile_urls",
+        "has_digital_facsimile",
+        "digital_facsimile_sources",
+    ]
+
+    return best[columns]
+
+
+def build_master_bibliography(overrides: Optional[Mapping[str, Mapping[str, object]]] = None) -> pd.DataFrame:
+    """Construct a unified DataFrame across all configured catalogues.
+
+    Parameters
+    ----------
+    overrides:
+        Optional mapping keyed by catalogue name that supplies keyword arguments
+        for the respective loader (e.g. ``{"USTC": {"path": "ustc_subset.csv"}}``).
+
+    Returns
+    -------
+    pandas.DataFrame
+        Normalised and de-duplicated catalogue entries limited to Latin-language
+        records. The result includes generated ``work_id`` values and flags for
+        known digital facsimiles.
+    """
+
+    frames = []
+    for name in CATALOG_SPECS:
+        frame = _load_catalogue(name, overrides=overrides.get(name) if overrides else None)
+        if frame.empty:
+            LOGGER.info("Catalogue %s produced no records (missing file or no Latin entries)", name)
+            continue
+        frames.append(frame)
+
+    if not frames:
+        LOGGER.warning("No catalogue data available; returning empty DataFrame")
+        return pd.DataFrame(
+            columns=[
+                "work_id",
+                "source_catalog",
+                "source_id",
+                "author",
+                "author_norm",
+                "title",
+                "title_norm",
+                "full_title",
+                "imprint_place",
+                "imprint_year",
+                "language",
+                "subjects",
+                "digital_facsimile_urls",
+                "has_digital_facsimile",
+                "digital_facsimile_sources",
+            ]
+        )
+
+    combined = pd.concat(frames, ignore_index=True)
+    master = _deduplicate(combined)
+    LOGGER.info("Master bibliography contains %s rows", len(master))
+    return master
+
+
+__all__ = [
+    "CATALOG_PRIORITY",
+    "CATALOG_SPECS",
+    "build_master_bibliography",
+]
diff --git a/latin_corpus/latin_corpus/normalize.py b/latin_corpus/latin_corpus/normalize.py
new file mode 100644
index 0000000..c697973
--- /dev/null
+++ b/latin_corpus/latin_corpus/normalize.py
@@ -0,0 +1,125 @@
+"""Normalisation helpers for bibliographic metadata."""
+
+from __future__ import annotations
+
+import re
+import string
+from typing import Iterable, Optional
+
+from unidecode import unidecode
+
+# Configuration values collected in a dictionary for quick adjustments.
+CONFIG = {
+    "author_honorifics": (
+        "dr",
+        "prof",
+        "professor",
+        "rev",
+        "reverend",
+        "sir",
+        "dom",
+        "fr",
+        "fra",
+    ),
+    "title_leading_stopwords": (
+        "de",
+        "in",
+        "ad",
+        "liber",
+    ),
+    "punctuation_preserve_title": {":", ","},
+}
+
+PUNCTUATION_TABLE_AUTHOR = str.maketrans({ch: " " for ch in string.punctuation})
+PUNCTUATION_TABLE_TITLE = str.maketrans(
+    {ch: " " for ch in string.punctuation if ch not in CONFIG["punctuation_preserve_title"]}
+)
+
+LANGUAGE_MAP = {
+    "lat": "Latin",
+    "la": "Latin",
+    "latin": "Latin",
+    "latine": "Latin",
+    "latius": "Latin",
+}
+
+YEAR_PATTERN = re.compile(r"(1[45-9]\d{2})")
+
+
+def _normalise_whitespace(value: str) -> str:
+    return re.sub(r"\s+", " ", value).strip()
+
+
+def _strip_honorifics(value: str, honorifics: Iterable[str]) -> str:
+    pattern = r"^(?:(?:" + "|".join(map(re.escape, honorifics)) + r")\.?,?\s+)+"
+    return re.sub(pattern, "", value)
+
+
+def normalize_author(name: Optional[str]) -> str:
+    """Return a lowercased, ASCII-fied author string without honorifics."""
+
+    if not name:
+        return ""
+    value = unidecode(str(name)).lower()
+    value = _strip_honorifics(value, CONFIG["author_honorifics"])
+    value = value.translate(PUNCTUATION_TABLE_AUTHOR)
+    return _normalise_whitespace(value)
+
+
+def normalize_title(title: Optional[str]) -> str:
+    """Return a normalised title suitable for matching across catalogues."""
+
+    if not title:
+        return ""
+    value = unidecode(str(title)).lower()
+    value = value.translate(PUNCTUATION_TABLE_TITLE)
+    value = _normalise_whitespace(value)
+
+    for stopword in CONFIG["title_leading_stopwords"]:
+        if value.startswith(f"{stopword} "):
+            value = value[len(stopword) + 1 :]
+            break
+
+    return value
+
+
+def extract_year(value: Optional[str | int | float]) -> Optional[int]:
+    """Extract the first plausible Gregorian year (1450–1999) from ``value``."""
+
+    if value is None or value != value:  # NaN check
+        return None
+
+    if isinstance(value, (int, float)) and not isinstance(value, bool):
+        int_value = int(value)
+        if 1450 <= int_value <= 1900:
+            return int_value
+        return None
+
+    text = unidecode(str(value))
+    match = YEAR_PATTERN.search(text)
+    if not match:
+        return None
+    year = int(match.group(1))
+    return year if 1450 <= year <= 1900 else None
+
+
+def standardize_language_label(label: Optional[str]) -> Optional[str]:
+    """Map language codes and descriptors to canonical names."""
+
+    if not label:
+        return None
+    cleaned = unidecode(label).lower().strip()
+    if cleaned in LANGUAGE_MAP:
+        return LANGUAGE_MAP[cleaned]
+    if cleaned.startswith("lat"):
+        return "Latin"
+    return label.strip()
+
+
+__all__ = [
+    "CONFIG",
+    "extract_year",
+    "normalize_author",
+    "normalize_title",
+    "standardize_language_label",
+]
diff --git a/latin_corpus/latin_corpus/priority.py b/latin_corpus/latin_corpus/priority.py
new file mode 100644
index 0000000..dcc9fcd
--- /dev/null
+++ b/latin_corpus/latin_corpus/priority.py
@@ -0,0 +1,107 @@
+"""Priority scoring utilities for the Latin master table."""
+
+from __future__ import annotations
+
+from typing import Mapping, MutableMapping, Optional
+
+from ._compat import require_pandas
+
+pd = require_pandas()
+
+
+PRIORITY_WEIGHTS: Mapping[str, float] = {
+    "missing_facsimile": 2.0,
+    "missing_translation": 2.0,
+    "scientific": 1.0,
+    "hermetic": 1.0,
+    "colonial": 1.0,
+    "early_modern_peak": 1.0,
+}
+
+KEYWORD_GROUPS: Mapping[str, tuple[str, ...]] = {
+    "scientific": ("astronom", "physic", "medic", "anatom", "botan", "mathemat"),
+    "hermetic": ("hermet", "alchem", "cabal", "magia", "occult"),
+    "colonial": ("india", "china", "mexic", "peru", "brazil", "goa", "iapon", "japan"),
+}
+
+EARLY_MODERN_RANGE: tuple[int, int] = (1500, 1650)
+
+
+def _ensure_columns(frame: pd.DataFrame) -> pd.DataFrame:
+    defaults = {
+        "has_digital_facsimile": False,
+        "has_modern_translation": False,
+        "subjects": "",
+        "title": "",
+        "priority_score": 0.0,
+        "priority_tags": "",
+    }
+    for col, default in defaults.items():
+        if col not in frame.columns:
+            frame[col] = default
+    return frame
+
+
+def _detect_keyword_tags(text: str) -> set[str]:
+    if not text:
+        return set()
+    lowered = text.lower()
+    tags = {name for name, keywords in KEYWORD_GROUPS.items() if any(keyword in lowered for keyword in keywords)}
+    return tags
+
+
+def add_priority_scores(master_df: pd.DataFrame, *, weights: Optional[Mapping[str, float]] = None) -> pd.DataFrame:
+    """Compute priority scores and tags for ``master_df``."""
+
+    if master_df.empty:
+        result = master_df.copy()
+        result["priority_score"] = pd.Series(dtype=float)
+        result["priority_tags"] = pd.Series(dtype=str)
+        return result
+
+    working = _ensure_columns(master_df.copy())
+    applied_weights: MutableMapping[str, float] = dict(PRIORITY_WEIGHTS)
+    if weights:
+        applied_weights.update(weights)
+
+    scores = []
+    tags_list = []
+    lower_bound, upper_bound = EARLY_MODERN_RANGE
+
+    for _, row in working.iterrows():
+        score = 0.0
+        tags: list[str] = []
+
+        if not bool(row.get("has_digital_facsimile", False)):
+            score += applied_weights["missing_facsimile"]
+            tags.append("unscanned")
+
+        if not bool(row.get("has_modern_translation", False)):
+            score += applied_weights["missing_translation"]
+            tags.append("untranslated")
+
+        text_blob = f"{row.get('title', '')} {row.get('subjects', '')}".strip()
+        for keyword_tag in _detect_keyword_tags(text_blob):
+            score += applied_weights.get(keyword_tag, 0.0)
+            tags.append(keyword_tag)
+
+        imprint_year = row.get("imprint_year")
+        if pd.notna(imprint_year):
+            try:
+                year_int = int(imprint_year)
+            except (TypeError, ValueError):
+                year_int = None
+            if year_int is not None and lower_bound <= year_int <= upper_bound:
+                score += applied_weights["early_modern_peak"]
+                tags.append("early_modern_peak")
+
+        scores.append(score)
+        tags_list.append(";".join(sorted(dict.fromkeys(tags))))
+
+    working["priority_score"] = scores
+    working["priority_tags"] = tags_list
+
+    return working
+
+
+__all__ = ["add_priority_scores", "PRIORITY_WEIGHTS", "KEYWORD_GROUPS", "EARLY_MODERN_RANGE"]
diff --git a/latin_corpus/latin_corpus/translation_match.py b/latin_corpus/latin_corpus/translation_match.py
new file mode 100644
index 0000000..7201178
--- /dev/null
+++ b/latin_corpus/latin_corpus/translation_match.py
@@ -0,0 +1,255 @@
+"""Utilities for matching Latin works to modern translations."""
+
+from __future__ import annotations
+
+import logging
+from typing import Iterable, Mapping, MutableMapping, Optional
+
+from ._compat import require_pandas
+
+pd = require_pandas()
+
+from .normalize import normalize_author, normalize_title
+
+LOGGER = logging.getLogger(__name__)
+
+
+TRANSLATION_INDEX_COLUMNS: tuple[str, ...] = (
+    "series_name",
+    "latin_author_norm",
+    "latin_title_norm",
+    "modern_language",
+    "translation_year",
+)
+
+
+DEFAULT_MATCH_CONFIG: Mapping[str, object] = {
+    "enable_fuzzy": True,
+    "fuzzy_threshold": 0.9,
+}
+
+
+try:  # pragma: no cover - optional dependency handling
+    from rapidfuzz import fuzz as _rf_fuzz
+
+    def _similarity(a: str, b: str) -> float:
+        return _rf_fuzz.ratio(a, b) / 100.0
+
+except Exception:  # pragma: no cover - optional dependency handling
+    try:
+        from Levenshtein import ratio as _lev_ratio
+
+        def _similarity(a: str, b: str) -> float:
+            return _lev_ratio(a, b)
+
+    except Exception:
+        _similarity = None  # type: ignore[assignment]
+
+
+def _normalise_translation_frame(frame: pd.DataFrame, series_name: str) -> pd.DataFrame:
+    """Return a copy of ``frame`` with normalised author/title columns."""
+
+    if frame.empty:
+        return pd.DataFrame(columns=[*TRANSLATION_INDEX_COLUMNS, "modern_languages", "translation_years"])
+
+    working = frame.copy()
+    working["series_name"] = series_name
+    working["latin_author_norm"] = working["latin_author"].fillna("").astype(str).map(normalize_author)
+    working["latin_title_norm"] = working["latin_title"].fillna("").astype(str).map(normalize_title)
+    working["modern_language"] = working["modern_language"].fillna("").astype(str).str.strip()
+    working["translation_year"] = working["year_of_translation"].fillna("").astype(str).str.extract(r"(\d{4})")[0]
+
+    return working[TRANSLATION_INDEX_COLUMNS]
+
+
+def build_translation_index(frames: Mapping[str, pd.DataFrame]) -> pd.DataFrame:
+    """Construct a translation index DataFrame from raw series frames.
+
+    Parameters
+    ----------
+    frames:
+        Mapping of human-readable series labels to DataFrames loaded via
+        :func:`latin_corpus.io_utils.load_translation_list`. Each frame should
+        contain the columns ``latin_author`` and ``latin_title`` in addition to
+        ``modern_language`` and ``year_of_translation``.
+
+    Returns
+    -------
+    pandas.DataFrame
+        Normalised index keyed by ``latin_author_norm`` and ``latin_title_norm``.
+        Additional columns store the concatenated translation sources, modern
+        languages, and translation years for quick lookups during matching.
+    """
+
+    normalised: list[pd.DataFrame] = []
+    for series_name, frame in frames.items():
+        normalised.append(_normalise_translation_frame(frame, series_name))
+
+    if not normalised:
+        return pd.DataFrame(columns=[
+            "latin_author_norm",
+            "latin_title_norm",
+            "translation_sources",
+            "modern_languages",
+            "translation_years",
+        ])
+
+    combined = pd.concat(normalised, ignore_index=True)
+    if combined.empty:
+        empty = combined.assign(
+            translation_sources=pd.Series(dtype=str),
+            modern_languages=pd.Series(dtype=str),
+            translation_years=pd.Series(dtype=str),
+        )
+        return empty[
+            [
+                "latin_author_norm",
+                "latin_title_norm",
+                "translation_sources",
+                "modern_languages",
+                "translation_years",
+            ]
+        ]
+    grouped = combined.groupby(["latin_author_norm", "latin_title_norm"], dropna=False)
+
+    def _collapse(values: Iterable[str]) -> str:
+        cleaned = []
+        for value in values:
+            if pd.isna(value):
+                continue
+            text = str(value).strip()
+            if not text or text.lower() in {"na", "nan", "none"}:
+                continue
+            cleaned.append(text)
+        unique = sorted(set(cleaned))
+        return ";".join(unique)
+
+    aggregated = grouped.agg(
+        translation_sources=("series_name", _collapse),
+        modern_languages=("modern_language", _collapse),
+        translation_years=("translation_year", _collapse),
+    ).reset_index()
+
+    return aggregated
+
+
+def _prepare_author_lookup(index: pd.DataFrame) -> Mapping[str, pd.DataFrame]:
+    lookup: MutableMapping[str, pd.DataFrame] = {}
+    if index.empty:
+        return lookup
+
+    for author, frame in index.groupby("latin_author_norm"):
+        lookup[str(author)] = frame
+    return lookup
+
+
+def add_translation_flags(
+    master_df: pd.DataFrame,
+    translation_index: pd.DataFrame,
+    *,
+    config: Optional[Mapping[str, object]] = None,
+) -> pd.DataFrame:
+    """Annotate ``master_df`` with translation availability information.
+
+    Parameters
+    ----------
+    master_df:
+        Bibliographic DataFrame produced by :func:`build_master_bibliography`.
+        Must contain ``author_norm`` and ``title_norm`` columns.
+    translation_index:
+        Output of :func:`build_translation_index` with normalised keys and
+        aggregated translation metadata.
+    config:
+        Optional configuration mapping overriding values in
+        :data:`DEFAULT_MATCH_CONFIG`. Supported keys are ``enable_fuzzy`` (bool)
+        and ``fuzzy_threshold`` (float).
+
+    Returns
+    -------
+    pandas.DataFrame
+        Copy of ``master_df`` enriched with the boolean field
+        ``has_modern_translation`` and supporting metadata columns
+        (``translation_sources``, ``translation_languages``,
+        ``translation_years``).
+    """
+
+    if master_df.empty:
+        result = master_df.copy()
+        result["has_modern_translation"] = False
+        result["translation_sources"] = ""
+        result["translation_languages"] = ""
+        result["translation_years"] = ""
+        return result
+
+    working = master_df.copy()
+
+    merged = working.merge(
+        translation_index,
+        how="left",
+        left_on=["author_norm", "title_norm"],
+        right_on=["latin_author_norm", "latin_title_norm"],
+    )
+
+    merged["has_modern_translation"] = merged["translation_sources"].fillna("").astype(str).str.len() > 0
+    merged["translation_sources"] = merged["translation_sources"].fillna("").astype(str)
+    merged["translation_languages"] = merged["modern_languages"].fillna("").astype(str)
+    merged["translation_years"] = merged["translation_years"].fillna("").astype(str)
+
+    needs_fuzzy = ~merged["has_modern_translation"]
+    if needs_fuzzy.any():
+        merged = _apply_fuzzy_matches(merged, translation_index, needs_fuzzy, config)
+
+    merged = merged.drop(columns=["latin_author_norm", "latin_title_norm", "modern_languages"], errors="ignore")
+
+    return merged
+
+
+def _apply_fuzzy_matches(
+    merged: pd.DataFrame,
+    translation_index: pd.DataFrame,
+    mask: pd.Series,
+    config: Optional[Mapping[str, object]],
+) -> pd.DataFrame:
+    options: MutableMapping[str, object] = dict(DEFAULT_MATCH_CONFIG)
+    if config:
+        options.update(config)
+
+    if not options.get("enable_fuzzy", True):
+        return merged
+
+    if _similarity is None:
+        LOGGER.warning("Fuzzy matching requested but no similarity backend is available.")
+        return merged
+
+    threshold = float(options.get("fuzzy_threshold", 0.9))
+
+    lookup = _prepare_author_lookup(translation_index)
+    for idx in merged.index[mask]:
+        author = str(merged.at[idx, "author_norm"])
+        title = str(merged.at[idx, "title_norm"])
+        candidates = lookup.get(author)
+        if candidates is None or candidates.empty:
+            continue
+
+        best_score = 0.0
+        best_row: Optional[pd.Series] = None
+        for _, candidate in candidates.iterrows():
+            score = _similarity(title, str(candidate["latin_title_norm"]))
+            if score > best_score:
+                best_score = score
+                best_row = candidate
+
+        if best_row is not None and best_score >= threshold:
+            merged.at[idx, "has_modern_translation"] = True
+            merged.at[idx, "translation_sources"] = best_row.get("translation_sources", "")
+            merged.at[idx, "translation_languages"] = best_row.get("modern_languages", "")
+            merged.at[idx, "translation_years"] = best_row.get("translation_years", "")
+
+    return merged
+
+
+__all__ = [
+    "DEFAULT_MATCH_CONFIG",
+    "build_translation_index",
+    "add_translation_flags",
+]
diff --git a/latin_corpus/notebooks/build_master_example.ipynb b/latin_corpus/notebooks/build_master_example.ipynb
new file mode 100644
index 0000000..0bb9b38
--- /dev/null
+++ b/latin_corpus/notebooks/build_master_example.ipynb
@@ -0,0 +1,38 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# Master bibliography quickstart\n",
+        "This notebook cell demonstrates how to call `build_master_bibliography()` and inspect the resulting DataFrame."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from latin_corpus.merge import build_master_bibliography\n",
+        "\n",
+        "master_df = build_master_bibliography()\n",
+        "master_df.head()"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/latin_corpus/requirements.txt b/latin_corpus/requirements.txt
new file mode 100644
index 0000000..aecd483
--- /dev/null
+++ b/latin_corpus/requirements.txt
@@ -0,0 +1,4 @@
+pandas
+python-Levenshtein
+unidecode
+rapidfuzz