From 313c11ff46604806e1359268a994bc2d94f0183e Mon Sep 17 00:00:00 2001 From: wpbonelli Date: Thu, 4 Dec 2025 11:24:37 -0500 Subject: [PATCH 1/7] planning --- docs/md/dev/models.md | 651 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 651 insertions(+) create mode 100644 docs/md/dev/models.md diff --git a/docs/md/dev/models.md b/docs/md/dev/models.md new file mode 100644 index 00000000..89766fd7 --- /dev/null +++ b/docs/md/dev/models.md @@ -0,0 +1,651 @@ +# Dynamic Registry Design Document + +## Overview + +Transition from a baked-in static registry to a dynamic, branch-aware registry system where model repositories maintain their own registries and `modflow-devtools` syncs to them on-demand. + +## Objectives + +1. Reduce developer maintenance burden (no manual registry regeneration) +2. Keep package size minimal +3. Allow users to access latest models without package updates +4. Support multiple refs (release tags, branches) +5. Maintain backward compatibility through 1.x series + +## Architecture + +### Current State (v1.x) +- 1.7MB+ registry TOML files shipped with package +- Fixed to specific snapshot of model repositories +- Manual developer task to regenerate + +### Target State (v2.x) +- Bootstrap metadata (~1-2KB) shipped with package +- Registries cached locally, synced from upstream repos +- Automatic registry generation in model repo CI + +### Transition (v1.x → v2.x) +- v1.x: Add sync mechanism as optional feature, keep shipping full registry with deprecation warning +- v2.x: Switch to bootstrap-only, require sync for registry access + +## Components + +### 1. Bootstrap Metadata + +**Location**: `modflow_devtools/registry/bootstrap.toml` + +**Format**: +```toml +[sources.modflow6-examples] +repo = "MODFLOW-ORG/modflow6-examples" +path = ".registry/registry.toml" + +[sources.modflow6-testmodels] +repo = "MODFLOW-ORG/modflow6-testmodels" +path = ".registry/registry.toml" + +[sources.modflow6-largetestmodels] +repo = "MODFLOW-ORG/modflow6-largetestmodels" +path = ".registry/registry.toml" +``` + +**Notes**: +- Simple, fixed path to registry in each repo +- No ref name in path (ref selection handled by API/Git ref) + +### 2. Registry Schema + +**Files per source**: +- `registry.toml` - file hashes and URLs (Pooch format) +- `models.toml` - model name → file list mapping +- `examples.toml` - example name → model list mapping + +**Metadata section** (add to each registry file): +```toml +[_meta] +schema_version = "1.0" +source_repo = "MODFLOW-ORG/modflow6-examples" +source_ref = "master" # branch name or tag +generated_at = "2025-12-04T14:30:00Z" +devtools_version = "1.9.0" +``` + +**Validation**: Use `pydantic` for schema validation and versioning + +### 3. Cache Structure + +**Location**: `~/.cache/modflow-devtools/registries/` (or platform equivalent via Pooch) + +**Directory layout**: +``` +~/.cache/modflow-devtools/ +├── registries/ +│ ├── modflow6-examples/ +│ │ ├── 1.2.3/ # release tag (if repo publishes releases) +│ │ │ ├── registry.toml +│ │ │ ├── models.toml +│ │ │ └── examples.toml +│ │ ├── master/ # branch +│ │ │ ├── registry.toml +│ │ │ ├── models.toml +│ │ │ └── examples.toml +│ │ └── develop/ # branch +│ │ ├── registry.toml +│ │ ├── models.toml +│ │ └── examples.toml +│ ├── modflow6-testmodels/ +│ │ ├── master/ +│ │ │ └── ... +│ │ └── develop/ +│ │ └── ... +│ └── modflow6-largetestmodels/ +│ └── ... +└── models/ # Actual model files, managed by Pooch + └── ... +``` + +**Notes**: +- Keep registries for multiple refs cached simultaneously (tags and branches) +- Cache directory named by ref (tag or branch name) +- Enables fast switching between refs +- Model files themselves cached separately by Pooch + +### 4. Ref Selection Priority + +**Default behavior** (when user doesn't specify a ref): +1. **Latest release tag** (if repo publishes releases - e.g., `1.2.3`) +2. **master branch** (fallback for repos without releases) +3. **develop branch** (fallback for repos without master) + +**Rationale**: Prefer stable/official tagged releases, gracefully degrade to branches + +**Implementation**: +- Check GitHub API for latest release tag +- If no releases found, fall back to `master` branch +- If `master` doesn't exist, fall back to `develop` branch + +**Git Ref Support**: +- **Supported**: Release tags (e.g., `v1.2.3`, `1.2.3`), branch names (e.g., `master`, `develop`, `feature/xyz`) +- **Not supported**: Commit SHAs (registries only generated on branch pushes/releases, not per-commit) +- **Error handling**: If user specifies a commit SHA, emit clear error message explaining limitation + +### 5. Sync Mechanism + +#### Install-Time Behavior +- **Best-effort sync** on package install (via `setup.py` or similar) +- **Warn if unsuccessful** but allow install to succeed +- **Retry on first import** if sync failed during install +- **Clear user messaging**: "Registry sync failed, remote models unavailable. Run `python -m modflow_devtools.models sync` to retry." + +#### Manual Sync Command + +**CLI**: `python -m modflow_devtools.models` + +**Subcommands**: +```bash +# Sync all sources to default refs (latest release tag → master → develop) +python -m modflow_devtools.models sync + +# Sync all sources to specific ref (branch or tag) +python -m modflow_devtools.models sync --ref develop +python -m modflow_devtools.models sync --ref v1.2.3 + +# Sync specific source +python -m modflow_devtools.models sync --source modflow6-examples + +# Sync specific source to specific ref +python -m modflow_devtools.models sync --source modflow6-examples --ref develop +python -m modflow_devtools.models sync --source modflow6-examples --ref v1.2.3 + +# Force re-download even if cached +python -m modflow_devtools.models sync --force + +# List available registries and their status +python -m modflow_devtools.models list + +# Show sync status +python -m modflow_devtools.models status +``` + +**Error handling for unsupported refs**: +```bash +# Commit SHA not supported - clear error message +python -m modflow_devtools.models sync --ref abc123def +# Error: Commit SHAs are not supported. Registries are only generated for branches and release tags. +# Please use a branch name (e.g., 'master', 'develop') or release tag (e.g., 'v1.2.3'). +``` + +**Programmatic API**: +```python +from modflow_devtools.models import sync_registry, get_registry + +# Sync and use default (latest release tag → master → develop) +sync_registry() +registry = get_registry() + +# Sync to specific ref (branch or tag) +sync_registry(ref="develop") +sync_registry(ref="v1.2.3") + +# Use specific ref without syncing +registry = get_registry(ref="develop") # uses cached, syncs if missing +registry = get_registry(ref="v1.2.3") # uses cached release tag + +# Use specific source and ref +registry = get_registry(source="modflow6-examples", ref="develop") +registry = get_registry(source="modflow6-examples", ref="v1.2.3") + +# Error on commit SHA +try: + registry = get_registry(ref="abc123def") +except ValueError as e: + print(e) # "Commit SHAs are not supported..." +``` + +#### Sync Implementation +- **For release tags**: Download registry files from GitHub release assets +- **For branches**: Download registry files from GitHub raw URLs (e.g., `https://raw.githubusercontent.com/MODFLOW-ORG/modflow6-examples/{branch}/.registry/registry.toml`) +- Validate schema version and structure +- Cache to local directory (named by ref - tag or branch) +- Merge multiple sources at API level (keep files separate on disk) +- **Ref detection**: Use GitHub API to determine if ref is a tag or branch + +### 6. Upstream Model Repository Changes + +**Required changes in each model repo** (modflow6-examples, modflow6-testmodels, modflow6-largetestmodels): + +#### CI Workflow +**File**: `.github/workflows/registry.yml` + +**Trigger**: Push to master/develop branches, or release tag creation + +**Steps**: +1. Install `modflow-devtools` (provides registry generation machinery) +2. Run registry generation: + ```bash + python -m modflow_devtools.make_registry \ + --path . \ + --output .registry \ + --url + ``` +3. Commit registry files to `.registry/` directory (for branches) +4. For release tags: Attach registry files as release assets + +**Notes**: +- Registry generation machinery remains in `modflow-devtools` +- Model repos consume it as a dependency +- Keeps single source of truth for registry format + +#### Directory Structure +``` +modflow6-examples/ +├── .registry/ +│ ├── registry.toml +│ ├── models.toml +│ └── examples.toml +├── examples/ +│ └── ... +└── .github/ + └── workflows/ + └── registry.yml +``` + +### 7. Registry Architecture & API + +#### Core Principle: Separation of Concerns + +- **`PoochRegistry`**: Single source, single ref - knows nothing about other sources +- **`MergedRegistry`**: Pure compositor - just merges existing registries, no construction logic +- **Module-level functions**: Handle sync, construction, and convenience APIs + +#### Model Naming Convention + +**Format**: `{source}@{ref}/{subpath}` + +**Components**: +- `source`: Repository identifier (e.g., `modflow6-examples`, `modflow6-testmodels`) +- `ref`: Git ref (branch or tag, e.g., `v1.2.3`, `master`, `develop`) +- `subpath`: Relative path within repo to model directory + +**Examples**: +- `modflow6-examples@v1.2.3/ex-gwf-twri` +- `modflow6-testmodels@develop/mf6/test001a_Tharmonic` +- `modflow6-largetestmodels@master/prudic2004t2` + +**Benefits**: +- Guarantees no name collisions (unique per source + ref + path) +- Makes model provenance explicit to users +- Allows mixing multiple refs of same source +- Simplifies cache key generation + +#### PoochRegistry (Single Source) + +**Purpose**: Represent a single source repository at a specific ref + +**Constructor**: Takes `source` (repo name) and `ref` (branch/tag) + +```python +class PoochRegistry(ModelRegistry): + def __init__(self, source: str, ref: str | None = None, cache_path: PathLike | None = None): + """Create registry for a single source repository + + Args: + source: Source repository name (e.g., "modflow6-examples") + ref: Git ref - branch name or release tag + (default: latest release tag → master → develop) + Commit SHAs not supported. + cache_path: Override default cache location + + Raises: + ValueError: If ref is a commit SHA + FileNotFoundError: If registry not cached and sync fails + """ + self._source = source + self._ref = self._resolve_ref(ref) # Applies default priority + self._cache_path = cache_path or self._default_cache_path() + self._load() # Load from cache, auto-sync if missing + + @property + def source(self) -> str: + """Source repository name""" + return self._source + + @property + def ref(self) -> str: + """Git ref (branch or tag)""" + return self._ref + + def sync(self, force: bool = False) -> None: + """Sync this registry from upstream + + Args: + force: Re-download even if cached + """ + ... + + def is_synced(self) -> bool: + """Check if registry is cached for this source/ref""" + ... + + # Inherited from ModelRegistry abstract class + @property + def files(self) -> dict: + """Map of file names to file info (with source@ref prefix)""" + ... + + @property + def models(self) -> dict: + """Map of model names to file lists (with source@ref prefix)""" + ... + + @property + def examples(self) -> dict: + """Map of example names to model lists (with source@ref prefix)""" + ... +``` + +**Key changes from current**: +- Loads from cache by default (not package resources) +- Auto-syncs if cache missing (best-effort on first access) +- All keys prefixed with `{source}@{ref}/` in returned dicts + +#### MergedRegistry (Compositor) + +**Purpose**: Merge multiple `ModelRegistry` instances into unified API + +**Constructor**: Takes list of pre-constructed registry instances + +```python +class MergedRegistry(ModelRegistry): + def __init__(self, registries: list[ModelRegistry]): + """Merge multiple registries into unified API + + Args: + registries: List of ModelRegistry instances (typically PoochRegistry) + Caller is responsible for constructing these with desired + sources and refs. + + Note: + This class is a pure compositor - it knows nothing about sources, + refs, syncing, or construction. All that logic happens before + MergedRegistry is created. + """ + self._registries = list(registries) + + @property + def registries(self) -> list[ModelRegistry]: + """The underlying registries being merged""" + return list(self._registries) # Return copy + + # Inherited from ModelRegistry - merge results from all registries + @property + def files(self) -> dict: + """Merged files from all registries""" + merged = {} + for registry in self._registries: + merged.update(registry.files) + return merged + + @property + def models(self) -> dict: + """Merged models from all registries""" + merged = {} + for registry in self._registries: + merged.update(registry.models) + return merged + + @property + def examples(self) -> dict: + """Merged examples from all registries""" + merged = {} + for registry in self._registries: + merged.update(registry.examples) + return merged +``` + +**Why no factory methods?** +- Construction is trivial: `MergedRegistry([reg1, reg2])` +- Users can easily create new instances when refs change +- Keeps the class focused and simple +- Avoids coupling MergedRegistry to PoochRegistry + +**Usage examples**: +```python +# Create individual registries +examples_v1 = PoochRegistry("modflow6-examples", "v1.2.3") +testmodels = PoochRegistry("modflow6-testmodels", "develop") + +# Merge them +merged = MergedRegistry([examples_v1, testmodels]) + +# Later: update to new ref +examples_v2 = PoochRegistry("modflow6-examples", "v2.0.0") +merged = MergedRegistry([examples_v2, testmodels]) + +# Mix multiple refs of same source +examples_stable = PoochRegistry("modflow6-examples", "v1.2.3") +examples_dev = PoochRegistry("modflow6-examples", "develop") +merged = MergedRegistry([examples_stable, examples_dev, testmodels]) +``` + +#### Module-Level API (Convenience Layer) + +**Purpose**: Provide convenient access for common use cases + +```python +# Module: modflow_devtools.models + +def get_registry( + source: str | None = None, + ref: str | None = None, + sources: dict[str, str] | None = None +) -> ModelRegistry: + """Get a registry (single source or merged) + + Args: + source: Single source name (returns PoochRegistry) + ref: Git ref to use (applies to single source or all sources) + sources: Dict mapping source names to refs for mixed-ref merged registry + e.g., {"modflow6-examples": "v1.2.3", "modflow6-testmodels": "develop"} + + Returns: + PoochRegistry if source specified, otherwise MergedRegistry + + Examples: + # Single source + reg = get_registry(source="modflow6-examples", ref="v1.2.3") + + # All sources, same ref + reg = get_registry(ref="develop") + + # All sources, default refs (latest release → master → develop) + reg = get_registry() + + # All sources, mixed refs + reg = get_registry(sources={ + "modflow6-examples": "v1.2.3", + "modflow6-testmodels": "develop" + }) + """ + if source: + return PoochRegistry(source, ref) + + if sources: + registries = [PoochRegistry(src, r) for src, r in sources.items()] + else: + # Load all from bootstrap, apply same ref to all + bootstrap = load_bootstrap() + registries = [PoochRegistry(src, ref) for src in bootstrap.sources.keys()] + + return MergedRegistry(registries) + + +def sync_registry(source: str | None = None, ref: str | None = None, force: bool = False) -> None: + """Sync registry from upstream + + Args: + source: Specific source to sync (default: all sources from bootstrap) + ref: Git ref to sync (default: latest release → master → develop) + force: Force re-download even if cached + """ + if source: + registry = PoochRegistry(source, ref) + registry.sync(force=force) + else: + bootstrap = load_bootstrap() + for src in bootstrap.sources.keys(): + registry = PoochRegistry(src, ref) + registry.sync(force=force) + + +# DEFAULT_REGISTRY is now a MergedRegistry +DEFAULT_REGISTRY = get_registry() # All sources, default refs +``` + +### 8. Backward Compatibility (v1.x) + +**Goals**: +- Don't break existing code +- Gentle migration path for users +- Clear deprecation warnings + +**Approach**: +1. Continue shipping full registry in v1.x +2. Add sync functionality as optional enhancement +3. Emit deprecation warning on import: + ``` + DeprecationWarning: Bundled registry is deprecated and will be removed in v2.0. + Use `python -m modflow_devtools.models sync` to download the latest registry. + ``` +4. Provide migration guide in docs + +**Breaking changes in v2.x**: +- Remove bundled registry files (except bootstrap.toml) +- Require sync for remote registry access (LocalRegistry unaffected) +- Document migration clearly in CHANGELOG + +## Implementation Plan + +### Phase 1: Foundation (v1.x) +1. Add bootstrap metadata file +2. Implement registry schema with Pydantic validation +3. Create cache directory structure utilities +4. Add `sync_registry()` function with download logic +5. Implement branch priority resolution +6. Add CLI subcommands (sync, list, status) + +### Phase 2: PoochRegistry Adaptation (v1.x) +1. Modify `PoochRegistry.__init__()` to check cache first +2. Add fallback to bundled registry +3. Implement best-effort sync on import +4. Add deprecation warnings for bundled registry + +### Phase 3: Upstream CI (concurrent with Phase 1-2) +1. Add `.github/workflows/registry.yml` to each model repo +2. Test registry generation in CI +3. Commit registry files to `.registry/` directories +4. For repos with releases, add registry as release asset + +### Phase 4: Testing & Documentation (v1.x) +1. Add comprehensive tests for sync mechanism +2. Test network failure scenarios +3. Document new workflow in `models.md` +4. Add migration guide for v2.x + +### Phase 5: v2.x Release +1. Remove bundled registry files (keep bootstrap.toml) +2. Make sync required for PoochRegistry +3. Update documentation +4. Release notes with clear migration instructions + +## Key Design Decisions + +1. **Install-time sync**: Best-effort, warn on failure, allow install to proceed +2. **Registry location**: `.registry/` directory on each branch in model repos; also as release assets for tagged releases +3. **Bootstrap format**: Simple TOML with repo and path, no ref substitution +4. **Multi-ref caching**: Support simultaneous caching of multiple refs (tags and branches) +5. **Schema versioning**: Use Pydantic, include `_meta` section in registries +6. **Ref priority**: Latest release tag → master branch → develop branch (when user doesn't specify) +7. **Ref support**: Branch names and release tags supported; commit SHAs not supported (with clear error message) +8. **CLI parameter**: Use `--ref` (not `--branch`) to clarify support for both tags and branches +9. **Transition**: Optional in v1.x with deprecation warning, required in v2.x +10. **Registry architecture**: Clear separation of concerns + - `PoochRegistry`: Single source, single ref - no knowledge of other sources + - `MergedRegistry`: Pure compositor - takes pre-built registries, no construction logic + - Module functions: Handle sync, construction, convenience APIs +11. **Model naming**: `{source}@{ref}/{subpath}` format guarantees collision-free names and explicit provenance +12. **Registry merging**: Keep separate on disk and in separate `PoochRegistry` instances, merge via `MergedRegistry` +13. **No factory methods**: `MergedRegistry` construction is trivial, users create new instances directly +14. **Mixed refs**: Supported naturally via naming scheme - can mix multiple refs of same source +15. **LocalRegistry**: Remains independent, serves different purpose (local development) + +## Design Considerations & Risk Mitigation + +### Name Collisions +**Risk**: Models from different sources could have identical names. + +**Mitigation**: Systematic naming scheme `{source}@{ref}/{subpath}` guarantees uniqueness: +- Each source has distinct identifier +- Refs are included in name +- Subpaths are unique within a source + +**Example**: `modflow6-examples@v1.2.3/ex-gwf-twri` cannot collide with `modflow6-testmodels@develop/ex-gwf-twri` + +### Partial Sync State +**Risk**: User syncs some sources but not others, leading to incomplete `MergedRegistry`. + +**Mitigation**: +- `MergedRegistry` is transparent - only merges what it's given +- Module-level `get_registry()` handles ensuring sources are synced +- `PoochRegistry` auto-syncs on first access (best-effort) +- Clear error messages if sync fails + +### Performance +**Risk**: Loading multiple registry files could be slow. + +**Analysis**: Not a concern - TOML files load instantly (even 1.7MB registry is trivial). Model files download lazily via Pooch only when accessed. + +**Decision**: No lazy loading needed for registries themselves. + +### Error Propagation +**Risk**: One source failing to sync could break entire `MergedRegistry`. + +**Mitigation**: +- `PoochRegistry` constructor fails fast if sync fails +- Caller (module functions) can handle errors before constructing `MergedRegistry` +- `MergedRegistry` itself is simple - no error handling needed (operates on valid registries) + +### Backward Compatibility +**Risk**: Changing `DEFAULT_REGISTRY` from `PoochRegistry` to `MergedRegistry` breaks code checking `isinstance(DEFAULT_REGISTRY, PoochRegistry)`. + +**Mitigation**: +- Both implement `ModelRegistry` abstract class +- API is identical for common operations +- Breaking change acceptable for v2.x with clear migration guide +- v1.x maintains current behavior with deprecation warnings + +### Cache Invalidation +**Risk**: Registry instance doesn't reflect newly synced data. + +**Mitigation**: +- Document that registries are immutable per ref +- To use new data, create new instance: `get_registry(ref="new-ref")` +- Construction is cheap (just loading TOML), so recreating is fine + +## Open Questions / Future Enhancements + +1. **Registry compression**: Should we gzip registry files for faster downloads? +2. **Partial registry updates**: Could we diff registries and download only changes? +3. **Registry CDN**: Should we consider hosting registries on a CDN for faster access? +4. **Offline mode**: Should we provide an explicit "offline mode" that never tries to sync? +5. **Registry analytics**: Track which models are most frequently accessed? +6. **Naming scheme refinement**: Keep current verbose prefixes (`mf6/example/`, `mf6/test/`) or simplify to `{repo-name}/{subpath}`? + +## Success Criteria + +1. Package size reduced by ~2MB +2. Users can access latest models without package update +3. Zero manual developer registry updates needed +4. Install always succeeds (even with network failures) +5. Existing v1.x code continues to work with deprecation warnings +6. Clear migration path to v2.x From 697839599badde625fb427b051eb19dcde61069c Mon Sep 17 00:00:00 2001 From: wpbonelli Date: Thu, 4 Dec 2025 17:17:48 -0500 Subject: [PATCH 2/7] more planning, wip --- docs/md/dev/models.md | 226 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 196 insertions(+), 30 deletions(-) diff --git a/docs/md/dev/models.md b/docs/md/dev/models.md index 89766fd7..0a1b731a 100644 --- a/docs/md/dev/models.md +++ b/docs/md/dev/models.md @@ -30,46 +30,164 @@ Transition from a baked-in static registry to a dynamic, branch-aware registry s ## Components -### 1. Bootstrap Metadata +### Registry bootstrap file -**Location**: `modflow_devtools/registry/bootstrap.toml` +In this project, registry bootstrap metadata can live in `modflow_devtools/models/bootstrap.toml`. + +#### File contents + +A `repo` attribute identifies the repository owner and name. + +The name of the section (under `sources.`) will become part of a prefix by which models can be hierarchically addressed. To override the name (thus the prefix as well) a `name` attribute can be provided. + +A `registry_path` attribute points to the directory containing the registry database files. This can default to `.registry/` and therefore be optional, only required if overridden. + +The `registry_path` **must** contain at least two (2) files: + +- `registry.toml` +- `models.toml` + +The `registry_path` **may** also contain a file called `examples.toml`. + +#### Sample file -**Format**: ```toml [sources.modflow6-examples] repo = "MODFLOW-ORG/modflow6-examples" -path = ".registry/registry.toml" +name = "mf6/example" +dirs = [""] [sources.modflow6-testmodels] repo = "MODFLOW-ORG/modflow6-testmodels" -path = ".registry/registry.toml" +name = "mf6/test" +dirs = [ + "mf6", + "mf5to6" +] [sources.modflow6-largetestmodels] repo = "MODFLOW-ORG/modflow6-largetestmodels" -path = ".registry/registry.toml" +name = "mf6/large" ``` -**Notes**: -- Simple, fixed path to registry in each repo -- No ref name in path (ref selection handled by API/Git ref) +### Registry Modes -### 2. Registry Schema +Model repositories operate in one of two distinct modes, depending on how model files are stored and distributed. The mode is **self-describing** - it's determined by attributes in the registry metadata, not by hints in the bootstrap file. -**Files per source**: -- `registry.toml` - file hashes and URLs (Pooch format) -- `models.toml` - model name → file list mapping -- `examples.toml` - example name → model list mapping +#### Mode 1: In-Repo Models + +**Characteristics**: +- Model input files are checked into the repository +- Registry files live in `.registry/` directory on each branch/tag +- Supports both branches and release tags as refs +- Model files fetched individually via GitHub raw content URLs + +**Registry metadata** (no asset attributes): +```toml +[_meta] +schema_version = "1.0" +source_repo = "MODFLOW-ORG/modflow6-testmodels" +source_ref = "master" +generated_at = "2025-12-04T14:30:00Z" +devtools_version = "1.9.0" +# No release_asset/registry_asset/models_asset = in-repo mode +``` + +**Examples**: `modflow6-testmodels`, `modflow6-largetestmodels` + +**Registry discovery**: `https://raw.githubusercontent.com/{org}/{repo}/{ref}/.registry/registry.toml` + +**Model file URLs**: Individual files via raw content URLs (specified in registry) + +#### Mode 2: Release-Only Models + +**Characteristics**: +- Model input files are built during release (not in repository) +- Registry files attached to release as assets +- Supports release tags only (branches don't have built models) +- Model files packaged in release zip asset + +**Registry metadata** (with asset attributes): + +**Option A: Single zip containing both registry and models** +```toml +[_meta] +schema_version = "1.0" +source_repo = "MODFLOW-ORG/modflow6-examples" +source_ref = "v1.2.3" +release_asset = "mf6examples.zip" # Both registry and models in this zip +generated_at = "2025-12-04T14:30:00Z" +devtools_version = "1.9.0" +``` -**Metadata section** (add to each registry file): +**Option B: Separate registry and model assets** ```toml [_meta] schema_version = "1.0" source_repo = "MODFLOW-ORG/modflow6-examples" -source_ref = "master" # branch name or tag +source_ref = "v1.2.3" +registry_asset = "registry.zip" # Registry files in this asset +models_asset = "models.zip" # Model files in this asset generated_at = "2025-12-04T14:30:00Z" devtools_version = "1.9.0" ``` +**Examples**: `modflow6-examples` + +**Registry discovery**: GitHub release assets for the given tag + +**Model file URLs**: All point to the release zip asset + +#### Mode Detection & Discovery + +`PoochRegistry` automatically discovers the mode when syncing: + +1. **If ref is a tag**: Try downloading registry from release assets first +2. **Fallback**: Try downloading registry from `.registry/` directory in repository +3. **After loading registry**: Inspect metadata to determine fetch strategy + - If `release_asset`, `registry_asset`, or `models_asset` present → Release mode + - Otherwise → In-repo mode + +**Error handling**: +```python +# Generic error when registry not found +FileNotFoundError( + f"Registry for '{source}@{ref}' not found. " + f"Tried: release assets (if tag) and repository .registry/ directory." +) + +# When attempting branch ref on release-only source +# (Will fail at discovery step - no registry in .registry/ dir) +FileNotFoundError( + f"Registry for '{source}@{ref}' not found at " + f"https://github.com/{org}/{repo}/blob/{ref}/.registry/registry.toml. " + f"This source may only support release tags." +) +``` + +### 2. Registry Schema + +**Files per source**: +- `registry.toml` - file hashes and URLs (Pooch format) +- `models.toml` - model name → file list mapping +- `examples.toml` - example name → model list mapping (optional) + +**Metadata section**: + +All registry files must include a `[_meta]` section with: +- `schema_version`: Registry schema version (currently "1.0") +- `source_repo`: Source repository identifier (e.g., "MODFLOW-ORG/modflow6-examples") +- `source_ref`: Git ref (branch or tag) this registry was built from +- `generated_at`: Timestamp when registry was generated +- `devtools_version`: Version of modflow-devtools used to generate registry + +**Mode-specific attributes** (optional, determine fetch strategy): +- `release_asset`: Name of single zip file containing both registry and models (Mode 2, Option A) +- `registry_asset`: Name of zip file containing registry files (Mode 2, Option B) +- `models_asset`: Name of zip file containing model files (Mode 2, Option B) + +See **Registry Modes** section above for complete examples of metadata for each mode. + **Validation**: Use `pydantic` for schema validation and versioning ### 3. Cache Structure @@ -318,10 +436,54 @@ class PoochRegistry(ModelRegistry): def sync(self, force: bool = False) -> None: """Sync this registry from upstream + Automatically discovers registry location and mode: + 1. If ref is a tag: Try release assets first + 2. Fallback: Try .registry/ directory in repository + 3. After loading: Inspect metadata to determine fetch strategy + Args: force: Re-download even if cached + + Raises: + FileNotFoundError: If registry not found in either location """ - ... + # Try release assets if ref is a tag + if self._is_tag(self._ref): + try: + self._sync_from_release_assets() + self._setup_pooch() # Configure based on metadata + return + except ReleaseNotFound: + pass # Fall through to repository + + # Try .registry/ directory in repository + try: + self._sync_from_repository() + self._setup_pooch() # Configure based on metadata + except FileNotFoundError: + raise FileNotFoundError( + f"Registry for '{self._source}@{self._ref}' not found. " + f"Tried: release assets (if tag) and repository .registry/ directory." + ) + + def _setup_pooch(self) -> None: + """Configure Pooch based on registry metadata (mode detection)""" + meta = self._meta + + if "release_asset" in meta: + # Mode 2, Option A: Single zip with registry + models + self._fetch_mode = "single_zip" + self._asset_name = meta["release_asset"] + + elif "models_asset" in meta: + # Mode 2, Option B: Separate registry and model assets + self._fetch_mode = "models_zip" + self._asset_name = meta["models_asset"] + + else: + # Mode 1: In-repo individual files + self._fetch_mode = "individual_files" + # URLs already in registry from make_registry.py def is_synced(self) -> bool: """Check if registry is cached for this source/ref""" @@ -562,22 +724,26 @@ DEFAULT_REGISTRY = get_registry() # All sources, default refs 1. **Install-time sync**: Best-effort, warn on failure, allow install to proceed 2. **Registry location**: `.registry/` directory on each branch in model repos; also as release assets for tagged releases -3. **Bootstrap format**: Simple TOML with repo and path, no ref substitution -4. **Multi-ref caching**: Support simultaneous caching of multiple refs (tags and branches) -5. **Schema versioning**: Use Pydantic, include `_meta` section in registries -6. **Ref priority**: Latest release tag → master branch → develop branch (when user doesn't specify) -7. **Ref support**: Branch names and release tags supported; commit SHAs not supported (with clear error message) -8. **CLI parameter**: Use `--ref` (not `--branch`) to clarify support for both tags and branches -9. **Transition**: Optional in v1.x with deprecation warning, required in v2.x -10. **Registry architecture**: Clear separation of concerns +3. **Bootstrap format**: Minimal TOML with just repo identifiers - no hints about location or fetch strategy +4. **Registry modes**: Self-describing via metadata attributes + - Mode 1 (in-repo): No asset attributes → individual file fetching + - Mode 2 (release-only): `release_asset`, `registry_asset`, or `models_asset` → zip fetching + - Mode discovered automatically during sync +5. **Multi-ref caching**: Support simultaneous caching of multiple refs (tags and branches) +6. **Schema versioning**: Use Pydantic, include `_meta` section in registries +7. **Ref priority**: Latest release tag → master branch → develop branch (when user doesn't specify) +8. **Ref support**: Branch names and release tags supported; commit SHAs not supported (with clear error message) +9. **CLI parameter**: Use `--ref` (not `--branch`) to clarify support for both tags and branches +10. **Transition**: Optional in v1.x with deprecation warning, required in v2.x +11. **Registry architecture**: Clear separation of concerns - `PoochRegistry`: Single source, single ref - no knowledge of other sources - `MergedRegistry`: Pure compositor - takes pre-built registries, no construction logic - Module functions: Handle sync, construction, convenience APIs -11. **Model naming**: `{source}@{ref}/{subpath}` format guarantees collision-free names and explicit provenance -12. **Registry merging**: Keep separate on disk and in separate `PoochRegistry` instances, merge via `MergedRegistry` -13. **No factory methods**: `MergedRegistry` construction is trivial, users create new instances directly -14. **Mixed refs**: Supported naturally via naming scheme - can mix multiple refs of same source -15. **LocalRegistry**: Remains independent, serves different purpose (local development) +12. **Model naming**: `{source}@{ref}/{subpath}` format guarantees collision-free names and explicit provenance +13. **Registry merging**: Keep separate on disk and in separate `PoochRegistry` instances, merge via `MergedRegistry` +14. **No factory methods**: `MergedRegistry` construction is trivial, users create new instances directly +15. **Mixed refs**: Supported naturally via naming scheme - can mix multiple refs of same source +16. **LocalRegistry**: Remains independent, serves different purpose (local development) ## Design Considerations & Risk Mitigation From d8ec2b07169b74a91e80233450beb20419d27b3e Mon Sep 17 00:00:00 2001 From: wpbonelli Date: Fri, 5 Dec 2025 08:11:28 -0500 Subject: [PATCH 3/7] editing wip --- docs/md/dev/models.md | 178 ++++++++++++++++++++++++------------------ 1 file changed, 103 insertions(+), 75 deletions(-) diff --git a/docs/md/dev/models.md b/docs/md/dev/models.md index 0a1b731a..43cdd42e 100644 --- a/docs/md/dev/models.md +++ b/docs/md/dev/models.md @@ -1,46 +1,57 @@ -# Dynamic Registry Design Document +# Model Registry Rework ## Overview -Transition from a baked-in static registry to a dynamic, branch-aware registry system where model repositories maintain their own registries and `modflow-devtools` syncs to them on-demand. +Transition from a static model registry baked into the package to a dynamic, explicitly versioned registry system where model repositories maintain their own catalogs and `modflow-devtools` syncs to them on-demand. -## Objectives +## Motivation -1. Reduce developer maintenance burden (no manual registry regeneration) -2. Keep package size minimal -3. Allow users to access latest models without package updates -4. Support multiple refs (release tags, branches) -5. Maintain backward compatibility through 1.x series +- Allow access to updated models without package updates (uncouple `modflow-devtools` releases from model repositories) +- Make model versioning explicit, with support for multiple refs (branches, release tags) +- Smaller package size (ship no large TOML files: only registry bootstrap info, not registries themselves) +- Lower developer maintenance burden (no manual registry regeneration) -## Architecture +## Background -### Current State (v1.x) -- 1.7MB+ registry TOML files shipped with package -- Fixed to specific snapshot of model repositories -- Manual developer task to regenerate +Currently each release of this package is fixed to a specific state of each model repository. It is incumbent on this package's developers to monitor the status of model repositories and, when models are updated, regenerate the registry and release a new version of this package. -### Target State (v2.x) -- Bootstrap metadata (~1-2KB) shipped with package -- Registries cached locally, synced from upstream repos -- Automatic registry generation in model repo CI +This tight coupling is a burden on developers, and prevents model repositories and `modflow-devtools` from moving independently. -### Transition (v1.x → v2.x) -- v1.x: Add sync mechanism as optional feature, keep shipping full registry with deprecation warning -- v2.x: Switch to bootstrap-only, require sync for registry access +It is also inconvenient for users as it is not currently clear which version of this package provides access to which versions of each model repository, and users must wait until developers manually re-release `modflow-devtools` for access to updated models. -## Components +Also, 1.7MB+ in TOML registry files are currently shipped with package, making up the install time network payload. -### Registry bootstrap file +## Proposal -In this project, registry bootstrap metadata can live in `modflow_devtools/models/bootstrap.toml`. +Make model repositories reponsible for their own registries. Make `modflow-devtools` responsible only for + +- defining the registry contract, +- providing registry-creation machinery, and +- storing bootstrap metadata necessary to locate remote model repositories, for... +- fetching registry information at install time or on demand, to synchronize the user-facing API, and +- locally caching registry data (as well as models, via Pooch, as is currently done). + +Model repositories can consume registry-creation machinery to generate their own registry metadata in CI, either for versioning with the relevant branch (for model repositories which don't have releases, e.g. the test models repositories) or as a release asset (for repositories which do have releases, e.g. the examples repository). + +For the remainder of the 1.x release series, keep shipping registry metadata with `modflow-devtools` for backwards-compatibility, now with the benefit of explicit model versioning. Allow syncing on demand for access to model updates. Stop shipping registry metadata and begin syncing remote model registry metadata at install time with the release of 2.x. + +Then metadata shipped with `modflow-devtools` should be a few KB at most. + +## Design + +### Bootstrap metadata file + +Bootstrap metadata simply tells `modflow-devtools` where to look for remote model repositories, plus some minimal supporting information. This can live in a single file e.g. `modflow_devtools/models/bootstrap.toml`. #### File contents -A `repo` attribute identifies the repository owner and name. +At the top level, the file can consist of a table of `sources`, each describing a model repository. -The name of the section (under `sources.`) will become part of a prefix by which models can be hierarchically addressed. To override the name (thus the prefix as well) a `name` attribute can be provided. +The name of each source can by default be inferred from the name of the subsection, i.e. `sources.name`. The name will become part of a prefix by which models can be hierarchically addressed. To override the name (thus the prefix as well) a `name` attribute can be provided explicitly. -A `registry_path` attribute points to the directory containing the registry database files. This can default to `.registry/` and therefore be optional, only required if overridden. +A `repo` attribute identifies the repository owner and name, separated by a forward slash. + +A `registry_path` attribute points to the directory containing the registry database files. This can default to `.registry/` and can therefore be optional. The `registry_path` **must** contain at least two (2) files: @@ -55,11 +66,15 @@ The `registry_path` **may** also contain a file called `examples.toml`. [sources.modflow6-examples] repo = "MODFLOW-ORG/modflow6-examples" name = "mf6/example" -dirs = [""] +refs = ["current"] [sources.modflow6-testmodels] repo = "MODFLOW-ORG/modflow6-testmodels" name = "mf6/test" +refs = [ + "develop", + "master", +] dirs = [ "mf6", "mf5to6" @@ -68,85 +83,98 @@ dirs = [ [sources.modflow6-largetestmodels] repo = "MODFLOW-ORG/modflow6-largetestmodels" name = "mf6/large" +refs = [ + "develop", + "master", +] ``` ### Registry Modes -Model repositories operate in one of two distinct modes, depending on how model files are stored and distributed. The mode is **self-describing** - it's determined by attributes in the registry metadata, not by hints in the bootstrap file. +Remote model repositories (i.e. instances of `PoochRegistry`) can operate in one of several modes, depending how model files are stored and distributed: + +- checkin mode +- release mode +- combined mode + + The mode is determined by attributes in the registry metadata in the relevant repository — `modflow-devtools` discovers on demand which mode(s) a model repository supports, and will adapt upon next sync if a repository adds or drops support for a mode. + +#### Checkin mode + +In **checkin** mode, model input files and registry metadata files are versioned in the repository. By default, registry files must live in a `.registry/` directory on each branch/tag/ref from which one wants to make models available. Registry metadata files are discovered for each of the `refs` specified in the registry bootstrap metadata file, according to the GitHub raw content URL: + +``` +https://raw.githubusercontent.com/{org}/{repo}/{ref}/.registry/registry.toml +``` + +On model access, model input files are fetched and cached (by Pooch) individually, also via GitHub raw content URLs. -#### Mode 1: In-Repo Models +This mode is intended to support model repositories for which no releases are made, currently: -**Characteristics**: -- Model input files are checked into the repository -- Registry files live in `.registry/` directory on each branch/tag -- Supports both branches and release tags as refs -- Model files fetched individually via GitHub raw content URLs +- `MODFLOW-ORG/modflow6-testmodels` +- `MODFLOW-ORG/modflow6-largetestmodels` + +A registry metadata file in **checkin** mode will look something like: -**Registry metadata** (no asset attributes): ```toml -[_meta] -schema_version = "1.0" -source_repo = "MODFLOW-ORG/modflow6-testmodels" -source_ref = "master" generated_at = "2025-12-04T14:30:00Z" devtools_version = "1.9.0" -# No release_asset/registry_asset/models_asset = in-repo mode +schema_version = "1.0" ``` -**Examples**: `modflow6-testmodels`, `modflow6-largetestmodels` +Alternative modes are enabled by the presence of additional attributes in the registry metadata file — if no such attributes are present, the repository defaults to **checkin** mode. -**Registry discovery**: `https://raw.githubusercontent.com/{org}/{repo}/{ref}/.registry/registry.toml` +#### Release mode -**Model file URLs**: Individual files via raw content URLs (specified in registry) +In **release** mode, model input files and the registry metadata file are posted/discovered as release assets rather than as version-controlled files in the repository. As in **checkin** mode, registry metadata files are discovered for each of the `refs` specified in the registry bootstrap metadata file, but unlike **checkin** mode, the registry metadata file is expected under an asset download URL (see below). -#### Mode 2: Release-Only Models +Note that **release** mode supports only release tags, not other ref types (e.g. commit hashes or branch names). -**Characteristics**: -- Model input files are built during release (not in repository) -- Registry files attached to release as assets -- Supports release tags only (branches don't have built models) -- Model files packaged in release zip asset +This mode is meant to support repositories which distribute models at release time. In this mode a repository need not version-control model input files — for instance, in the case `MODFLOW-ORG/modflow6-examples`, only FloPy scripts are under version control, and model input files are built by the release automation. -**Registry metadata** (with asset attributes): +**Release** mode is enabled in a registry metadata file by the presence of either + +- a single `release_asset` attribute, indicating the name of a zip file containing both models and registry metadata file, or +- an attribute `models_asset` *and* an attribute `registry_asset`, indicating the names of the corresponding assets, if separate. -**Option A: Single zip containing both registry and models** ```toml -[_meta] -schema_version = "1.0" -source_repo = "MODFLOW-ORG/modflow6-examples" -source_ref = "v1.2.3" -release_asset = "mf6examples.zip" # Both registry and models in this zip generated_at = "2025-12-04T14:30:00Z" devtools_version = "1.9.0" +schema_version = "1.0" +# registry and models in 1 asset +release_asset = "mf6examples.zip" +# ...or in separate assets +models_asset = "models.zip" +registry_asset = "registry.toml" ``` -**Option B: Separate registry and model assets** -```toml -[_meta] -schema_version = "1.0" -source_repo = "MODFLOW-ORG/modflow6-examples" -source_ref = "v1.2.3" -registry_asset = "registry.zip" # Registry files in this asset -models_asset = "models.zip" # Model files in this asset -generated_at = "2025-12-04T14:30:00Z" -devtools_version = "1.9.0" +If `release_asset` is present, it will take precedence over `models_asset` and `registry_asset`. + +The registry metadata file is expected under a release asset URL: + +``` +https://github.com/{repo}/releases/download/{ref}/{registry_asset} +``` + +On model access, the release asset containing models is fetched from its asset download URL: + +``` +https://github.com/{repo}/releases/download/{ref}/{models_asset} ``` -**Examples**: `modflow6-examples` +The asset is then unzipped and all models are cached at once (again all handled by Pooch). This means **release** mode may entail more wait time upon first model access, while the zip file is fetched and unzipped, after which model access will generally be faster than **checkin** mode. -**Registry discovery**: GitHub release assets for the given tag +#### Combined mode -**Model file URLs**: All point to the release zip asset +TODO #### Mode Detection & Discovery -`PoochRegistry` automatically discovers the mode when syncing: +The registry will attempt to discover the mode of the target repository at sync time, according to the following algorithm for each of the `refs` specified in the bootstrap metadata file: -1. **If ref is a tag**: Try downloading registry from release assets first -2. **Fallback**: Try downloading registry from `.registry/` directory in repository -3. **After loading registry**: Inspect metadata to determine fetch strategy - - If `release_asset`, `registry_asset`, or `models_asset` present → Release mode - - Otherwise → In-repo mode +1. Look for a release tag matching the ref. If one exists, look for an asset called `registry.toml`. If found, download and inspect it to determine the name of the asset containing models. If not found, go to step 2. If no release tag matches the ref, go to step 2. +2. Look for a branch or commit hash matching the ref. If one exists, look for a registry metadata file in the default (or specified) location. If found, download and inspect it to determine the subdirectories containing models. If not found, go to step 3. If no matching branch or commit exists, go to step 3. +3. Raise an error indicating failure to discover/load a registry for the given ref. **Error handling**: ```python From da719a2742fd134ea6c0388c0409f23b4fc78fc1 Mon Sep 17 00:00:00 2001 From: wpbonelli Date: Sat, 6 Dec 2025 18:40:13 -0500 Subject: [PATCH 4/7] wip --- docs/md/dev/models.md | 179 ++++++++++++++++++------------------------ 1 file changed, 75 insertions(+), 104 deletions(-) diff --git a/docs/md/dev/models.md b/docs/md/dev/models.md index 43cdd42e..dc5b0307 100644 --- a/docs/md/dev/models.md +++ b/docs/md/dev/models.md @@ -1,66 +1,62 @@ -# Model Registry Rework +# Models API Design -## Overview - -Transition from a static model registry baked into the package to a dynamic, explicitly versioned registry system where model repositories maintain their own catalogs and `modflow-devtools` syncs to them on-demand. +This document describes the (re)design of the Models API ([GitHub issue #134](https://github.com/MODFLOW-ORG/modflow-devtools/issues/134)). It is intended to be developer-facing, not user-facing, though users may also find it informative. -## Motivation - -- Allow access to updated models without package updates (uncouple `modflow-devtools` releases from model repositories) -- Make model versioning explicit, with support for multiple refs (branches, release tags) -- Smaller package size (ship no large TOML files: only registry bootstrap info, not registries themselves) -- Lower developer maintenance burden (no manual registry regeneration) +This is a living document which will be updated as development proceeds. As the reimplementation nears completion, the scope here will shrink from charting a detailed transition path to simply describing the new design. ## Background -Currently each release of this package is fixed to a specific state of each model repository. It is incumbent on this package's developers to monitor the status of model repositories and, when models are updated, regenerate the registry and release a new version of this package. +Currently each release of `modflow-devtools` is fixed to a specific state of each model repository. It is incumbent on this package's developers to monitor the status of model repositories and, when models are updated, regenerate the registry and release a new version of this package. -This tight coupling is a burden on developers, and prevents model repositories and `modflow-devtools` from moving independently. +This tight coupling is inconvenient for consumers. It is not currently clear which version of `modflow-devtools` provides access to which versions of each model repository, and users must wait until developers manually re-release `modflow-devtools` for access to updated models. Also, 1.7MB+ in TOML registry files are currently shipped with package, bloating the install time network payload. -It is also inconvenient for users as it is not currently clear which version of this package provides access to which versions of each model repository, and users must wait until developers manually re-release `modflow-devtools` for access to updated models. +The coupling is also burdensome to developers, preventing model repositories and `modflow-devtools` from moving independently. -Also, 1.7MB+ in TOML registry files are currently shipped with package, making up the install time network payload. +## Objective -## Proposal +Transition from a static model registry baked into `modflow-devtools` releases to a dynamic, explicitly versioned registry system where model repositories publish catalogs which `modflow-devtools` discovers and synchronizes to on-demand. -Make model repositories reponsible for their own registries. Make `modflow-devtools` responsible only for +## Motivation -- defining the registry contract, -- providing registry-creation machinery, and -- storing bootstrap metadata necessary to locate remote model repositories, for... -- fetching registry information at install time or on demand, to synchronize the user-facing API, and -- locally caching registry data (as well as models, via Pooch, as is currently done). +- Uncouple `modflow-devtools` releases from model repositories, allowing access to updated models without package updates +- Make model repository versioning explicit, with generic support for `git` refs (branches, commit hashes, tags, and tagged releases) +- Shrink the package size: ship no large TOML files, only minimal bootstrap information rather than full registries +- Reduce the `modflow-devtools` developer maintenance burden by eliminating the responsibility for (re)generating registries -Model repositories can consume registry-creation machinery to generate their own registry metadata in CI, either for versioning with the relevant branch (for model repositories which don't have releases, e.g. the test models repositories) or as a release asset (for repositories which do have releases, e.g. the examples repository). +## Overview -For the remainder of the 1.x release series, keep shipping registry metadata with `modflow-devtools` for backwards-compatibility, now with the benefit of explicit model versioning. Allow syncing on demand for access to model updates. Stop shipping registry metadata and begin syncing remote model registry metadata at install time with the release of 2.x. +Make model repositories reponsible for publishing their own registries. -Then metadata shipped with `modflow-devtools` should be a few KB at most. +Make `modflow-devtools` responsible only for -## Design +- defining the registry publication contract; +- providing registry-creation machinery; +- storing bootstrap information locating model repositories; +- discovering remote registries at install time or on demand; +- caching registry data and models input files; and +- exposing a synchronized view of available registries. -### Bootstrap metadata file +Model repository developers can use the `modflow-devtools` registry-creation facilities to generate registry metadata, either manually or in CI. -Bootstrap metadata simply tells `modflow-devtools` where to look for remote model repositories, plus some minimal supporting information. This can live in a single file e.g. `modflow_devtools/models/bootstrap.toml`. +## Architecture -#### File contents +This will involve a few new components (e.g., bootstrap file, `MergedRegistry` class) as well as modifications to some existing components (e.g., existing registry files, `PoochRegistry`). It should be possible for the `ModelRegistry` contract to remain unchanged. -At the top level, the file can consist of a table of `sources`, each describing a model repository. +### Bootstrap file -The name of each source can by default be inferred from the name of the subsection, i.e. `sources.name`. The name will become part of a prefix by which models can be hierarchically addressed. To override the name (thus the prefix as well) a `name` attribute can be provided explicitly. +The **bootstrap** file will tell `modflow-devtools` where to look for remote model repositories. This file will be checked into the repository at `modflow_devtools/models/bootstrap.toml` and distributed with the package. -A `repo` attribute identifies the repository owner and name, separated by a forward slash. +#### Bootstrap file contents -A `registry_path` attribute points to the directory containing the registry database files. This can default to `.registry/` and can therefore be optional. +At the top level, the bootstrap file consists of a table of `sources`, each describing a model repository. -The `registry_path` **must** contain at least two (2) files: +The name of each source is by default inferred from the name of the subsection, i.e. `sources.name`. The name will become part of a prefix by which models can be hierarchically addressed (described below). To override the name (and thus the prefix) a `name` attribute may be provided. -- `registry.toml` -- `models.toml` +The source repository is identified by a `repo` attribute consisting of the repository owner and name separated by a forward slash. -The `registry_path` **may** also contain a file called `examples.toml`. +A `registry_path` attribute identifies the directory in the repository which contains the registry metadata file. This attribute is optional and defaults to `.registry/`. -#### Sample file +#### Sample bootstrap file ```toml [sources.modflow6-examples] @@ -75,10 +71,6 @@ refs = [ "develop", "master", ] -dirs = [ - "mf6", - "mf5to6" -] [sources.modflow6-largetestmodels] repo = "MODFLOW-ORG/modflow6-largetestmodels" @@ -89,17 +81,31 @@ refs = [ ] ``` -### Registry Modes +### Registry files + +There are currently three separate registry files: + +- `registry.toml`: enumerates invidual files known to the registry. Each file is a section consisting of at minimum a `url` attribute, as well as an optional `hash` attribute. These attributes deliberately provide the information Pooch expects for each file and no more, so that a `pooch.Pooch` instance's `.registry` property may be set directly from the contents of `registry.toml`. +- `models.toml`: groups files appearing in `registry.toml` according to the model they belong to. From the perspective of the Models API, a model consists of an unordered set of input files. +- `examples.toml`: groups models appearing in `models.toml` according to the example scenario they belong to. From the perspective of the Models API, an example scenario consists of an *ordered* set of models — order is relevant because a flow model, for instance, must run before a transport model. This allows API consumers to run models in the order received. -Remote model repositories (i.e. instances of `PoochRegistry`) can operate in one of several modes, depending how model files are stored and distributed: +It seems simplest to consolidate these into a single `registry.toml` file defining sections `files`, `models`, and `examples` corresponding to the contents of each of the current registry files. It remains convenient, I think, for the contents of the `files` section to continue conforming to the expectations of `Pooch.registry`. -- checkin mode -- release mode -- combined mode +Registry files can begin to define a few new items of metadata: - The mode is determined by attributes in the registry metadata in the relevant repository — `modflow-devtools` discovers on demand which mode(s) a model repository supports, and will adapt upon next sync if a repository adds or drops support for a mode. +```toml +generated_at = "2025-12-04T14:30:00Z" +devtools_version = "1.9.0" +schema_version = "1.1" +``` + +Versioning the registry file schema will smooth migration from the existing state of the API to the proposed design, as well as any further migrations pending future development. -#### Checkin mode +### Registry discovery + +Model repositories can publish models to `modflow-devtools` in two ways, depending how model files are stored and distributed. + +#### Model files under version control In **checkin** mode, model input files and registry metadata files are versioned in the repository. By default, registry files must live in a `.registry/` directory on each branch/tag/ref from which one wants to make models available. Registry metadata files are discovered for each of the `refs` specified in the registry bootstrap metadata file, according to the GitHub raw content URL: @@ -114,17 +120,7 @@ This mode is intended to support model repositories for which no releases are ma - `MODFLOW-ORG/modflow6-testmodels` - `MODFLOW-ORG/modflow6-largetestmodels` -A registry metadata file in **checkin** mode will look something like: - -```toml -generated_at = "2025-12-04T14:30:00Z" -devtools_version = "1.9.0" -schema_version = "1.0" -``` - -Alternative modes are enabled by the presence of additional attributes in the registry metadata file — if no such attributes are present, the repository defaults to **checkin** mode. - -#### Release mode +#### Model files as release assets In **release** mode, model input files and the registry metadata file are posted/discovered as release assets rather than as version-controlled files in the repository. As in **checkin** mode, registry metadata files are discovered for each of the `refs` specified in the registry bootstrap metadata file, but unlike **checkin** mode, the registry metadata file is expected under an asset download URL (see below). @@ -132,49 +128,34 @@ Note that **release** mode supports only release tags, not other ref types (e.g. This mode is meant to support repositories which distribute models at release time. In this mode a repository need not version-control model input files — for instance, in the case `MODFLOW-ORG/modflow6-examples`, only FloPy scripts are under version control, and model input files are built by the release automation. -**Release** mode is enabled in a registry metadata file by the presence of either - -- a single `release_asset` attribute, indicating the name of a zip file containing both models and registry metadata file, or -- an attribute `models_asset` *and* an attribute `registry_asset`, indicating the names of the corresponding assets, if separate. +For models distributed this way, file entries' `url` attribute should point to the release asset URL for a zipfile containing model input files, according to the pattern (given the source entry in the bootstrap file has `repo` and `ref` attributes): -```toml -generated_at = "2025-12-04T14:30:00Z" -devtools_version = "1.9.0" -schema_version = "1.0" -# registry and models in 1 asset -release_asset = "mf6examples.zip" -# ...or in separate assets -models_asset = "models.zip" -registry_asset = "registry.toml" ``` - -If `release_asset` is present, it will take precedence over `models_asset` and `registry_asset`. - -The registry metadata file is expected under a release asset URL: - -``` -https://github.com/{repo}/releases/download/{ref}/{registry_asset} +https://github.com/{repo}/releases/download/{ref}/some.zip ``` -On model access, the release asset containing models is fetched from its asset download URL: +For instance, for the `MODFLOW-ORG/modflow6-examples` repo: +```toml +["ex-gwe-ates/ex-gwe-ates.tdis"] +url = "https://github.com/MODFLOW-ORG/modflow6-examples/releases/download/current/mf6examples.zip" +... ``` -https://github.com/{repo}/releases/download/{ref}/{models_asset} -``` -The asset is then unzipped and all models are cached at once (again all handled by Pooch). This means **release** mode may entail more wait time upon first model access, while the zip file is fetched and unzipped, after which model access will generally be faster than **checkin** mode. +On model access, the release asset containing models is fetched from its asset download URL, unzipped, and all models are cached at once (all by Pooch). This means that model input files published in this way will be slower upon first model access (while the zip file is fetched and unzipped) than with the version-controlled model input file approach. + +#### Combining publication schemes -#### Combined mode +A repository may make registry metadata and model input files available in both ways, as version-controlled files *and* as release assets. In this case, the discovery order described below becomes particularly relevant. -TODO +#### Registry discovery procedure -#### Mode Detection & Discovery +At sync time, `modflow-devtools` attempts to discover remote registries according to the following algorithm for each of the `refs` specified in the bootstrap metadata file: -The registry will attempt to discover the mode of the target repository at sync time, according to the following algorithm for each of the `refs` specified in the bootstrap metadata file: +1. Look for a matching release tag. If one exists, the registry discovery mechanism continues in **release asset** mode, looking for a release asset named `registry.toml`. If no matching release tag can be found, go to step 2. If the matching release contains no asset named `registry.toml`, raise an error indicating that the given release lacks the required registry metadata file asset. +2. Look for a branch, tag, or commit hash matching the ref. If one exists, the registry load mechanism continues in **version-controlled** mode, looking for a registry metadata file in the location specified in the bootstrap file (or in the default location `.registry/`). If no matching branch or commit is found, raise an error indicating registry discovery has failed for the given ref. If no registry metadata file can be found, raise an error indicating that the given branch or commit lacks a registry metadata file in the expected location. -1. Look for a release tag matching the ref. If one exists, look for an asset called `registry.toml`. If found, download and inspect it to determine the name of the asset containing models. If not found, go to step 2. If no release tag matches the ref, go to step 2. -2. Look for a branch or commit hash matching the ref. If one exists, look for a registry metadata file in the default (or specified) location. If found, download and inspect it to determine the subdirectories containing models. If not found, go to step 3. If no matching branch or commit exists, go to step 3. -3. Raise an error indicating failure to discover/load a registry for the given ref. +If registry metadata file discovery is successful, it is fetched and parsed to determine the location(s) of model input files. **Error handling**: ```python @@ -193,13 +174,6 @@ FileNotFoundError( ) ``` -### 2. Registry Schema - -**Files per source**: -- `registry.toml` - file hashes and URLs (Pooch format) -- `models.toml` - model name → file list mapping -- `examples.toml` - example name → model list mapping (optional) - **Metadata section**: All registry files must include a `[_meta]` section with: @@ -835,11 +809,8 @@ DEFAULT_REGISTRY = get_registry() # All sources, default refs 5. **Registry analytics**: Track which models are most frequently accessed? 6. **Naming scheme refinement**: Keep current verbose prefixes (`mf6/example/`, `mf6/test/`) or simplify to `{repo-name}/{subpath}`? -## Success Criteria +## Rollout + +For the remainder of the 1.x release series, keep shipping registry metadata with `modflow-devtools` for backwards-compatibility, now with the benefit of explicit model versioning. Allow syncing on demand for access to model updates. Stop shipping registry metadata and begin syncing remote model registry metadata at install time with the release of 2.x. -1. Package size reduced by ~2MB -2. Users can access latest models without package update -3. Zero manual developer registry updates needed -4. Install always succeeds (even with network failures) -5. Existing v1.x code continues to work with deprecation warnings -6. Clear migration path to v2.x +Then metadata shipped with `modflow-devtools` should be a few KB at most. \ No newline at end of file From 97fdde1f08ea53258bcdb5909ebf92d189f130fe Mon Sep 17 00:00:00 2001 From: wpbonelli Date: Sun, 7 Dec 2025 18:22:50 -0500 Subject: [PATCH 5/7] continue rewriting robot slop --- docs/md/dev/models.md | 144 ++++++++++++++++-------------------------- 1 file changed, 53 insertions(+), 91 deletions(-) diff --git a/docs/md/dev/models.md b/docs/md/dev/models.md index dc5b0307..7d37491d 100644 --- a/docs/md/dev/models.md +++ b/docs/md/dev/models.md @@ -54,7 +54,7 @@ The name of each source is by default inferred from the name of the subsection, The source repository is identified by a `repo` attribute consisting of the repository owner and name separated by a forward slash. -A `registry_path` attribute identifies the directory in the repository which contains the registry metadata file. This attribute is optional and defaults to `.registry/`. +A `registry_path` attribute identifies the directory in the repository which contains the registry metadata file. This attribute is optional and defaults to `.registry/`. This attribute is only relevant if the repository versions the registry file and model input files, as described below. #### Sample bootstrap file @@ -99,15 +99,15 @@ devtools_version = "1.9.0" schema_version = "1.1" ``` -Versioning the registry file schema will smooth migration from the existing state of the API to the proposed design, as well as any further migrations pending future development. +Versioning the registry file schema will smooth the migration from the existing state of the API to the proposed design, as well as any further migrations pending future development. ### Registry discovery -Model repositories can publish models to `modflow-devtools` in two ways, depending how model files are stored and distributed. +Model repositories can publish models to `modflow-devtools` in two ways. #### Model files under version control -In **checkin** mode, model input files and registry metadata files are versioned in the repository. By default, registry files must live in a `.registry/` directory on each branch/tag/ref from which one wants to make models available. Registry metadata files are discovered for each of the `refs` specified in the registry bootstrap metadata file, according to the GitHub raw content URL: +Model input files and registry metadata files may be versioned in the model repository. Under this scheme, registry files are expected by default in a `.registry/` directory — this location can be overridden by the `registry_path` attribute in the bootstrap file (see above). Registry files are discovered for each of the `refs` specified in the registry bootstrap metadata file, according to the GitHub raw content URL: ``` https://raw.githubusercontent.com/{org}/{repo}/{ref}/.registry/registry.toml @@ -115,88 +115,77 @@ https://raw.githubusercontent.com/{org}/{repo}/{ref}/.registry/registry.toml On model access, model input files are fetched and cached (by Pooch) individually, also via GitHub raw content URLs. -This mode is intended to support model repositories for which no releases are made, currently: +This mode supports repositories for which model input files live directly in the repository and does not require the repository to publish releases, e.g. - `MODFLOW-ORG/modflow6-testmodels` - `MODFLOW-ORG/modflow6-largetestmodels` #### Model files as release assets -In **release** mode, model input files and the registry metadata file are posted/discovered as release assets rather than as version-controlled files in the repository. As in **checkin** mode, registry metadata files are discovered for each of the `refs` specified in the registry bootstrap metadata file, but unlike **checkin** mode, the registry metadata file is expected under an asset download URL (see below). - -Note that **release** mode supports only release tags, not other ref types (e.g. commit hashes or branch names). - -This mode is meant to support repositories which distribute models at release time. In this mode a repository need not version-control model input files — for instance, in the case `MODFLOW-ORG/modflow6-examples`, only FloPy scripts are under version control, and model input files are built by the release automation. - -For models distributed this way, file entries' `url` attribute should point to the release asset URL for a zipfile containing model input files, according to the pattern (given the source entry in the bootstrap file has `repo` and `ref` attributes): +Model input files and the registry metadata file may also be published as release assets. Registry metadata files are again discovered for each of the `refs` specified in the registry bootstrap metadata file. In this scheme, the registry file need not be checked into the repository, and may instead be generated on demand by release automation. Registry files are sought instead under a release asset download URLs: ``` -https://github.com/{repo}/releases/download/{ref}/some.zip +https://github.com/{repo}/releases/download/{ref}/registry.toml ``` -For instance, for the `MODFLOW-ORG/modflow6-examples` repo: +Note that only release tags, not other ref types (e.g. commit hashes or branch names), are supported. + +This scheme is meant to support repositories which distribute model input files as GitHub releases, and may not version them — for instance, in the case of `MODFLOW-ORG/modflow6-examples`, only FloPy scripts are under version control, and model input files are built by the release automation. + +For models distributed this way, file entries' `url` attribute in the registry file should point to a release asset download URL for a zipfile containing model input files, e.g. for the `MODFLOW-ORG/modflow6-examples` repo: ```toml ["ex-gwe-ates/ex-gwe-ates.tdis"] url = "https://github.com/MODFLOW-ORG/modflow6-examples/releases/download/current/mf6examples.zip" -... ``` On model access, the release asset containing models is fetched from its asset download URL, unzipped, and all models are cached at once (all by Pooch). This means that model input files published in this way will be slower upon first model access (while the zip file is fetched and unzipped) than with the version-controlled model input file approach. #### Combining publication schemes -A repository may make registry metadata and model input files available in both ways, as version-controlled files *and* as release assets. In this case, the discovery order described below becomes particularly relevant. +A repository may make registry files and model input files available in both ways, as version-controlled files *and* as release assets. In this case, discovery order becomes relevant: **model/registry releases take precedence over models/registries under version-control**. The discovery procedure is described in detail below. #### Registry discovery procedure At sync time, `modflow-devtools` attempts to discover remote registries according to the following algorithm for each of the `refs` specified in the bootstrap metadata file: -1. Look for a matching release tag. If one exists, the registry discovery mechanism continues in **release asset** mode, looking for a release asset named `registry.toml`. If no matching release tag can be found, go to step 2. If the matching release contains no asset named `registry.toml`, raise an error indicating that the given release lacks the required registry metadata file asset. -2. Look for a branch, tag, or commit hash matching the ref. If one exists, the registry load mechanism continues in **version-controlled** mode, looking for a registry metadata file in the location specified in the bootstrap file (or in the default location `.registry/`). If no matching branch or commit is found, raise an error indicating registry discovery has failed for the given ref. If no registry metadata file can be found, raise an error indicating that the given branch or commit lacks a registry metadata file in the expected location. - -If registry metadata file discovery is successful, it is fetched and parsed to determine the location(s) of model input files. +1. Look for a matching release tag. If one exists, the registry discovery mechanism continues in **release asset** mode, looking for a release asset named `registry.toml`. If no matching release tag can be found, go to step 2. If the matching release contains no asset named `registry.toml`, raise an error indicating that the given release lacks the required registry metadata file asset: -**Error handling**: ```python -# Generic error when registry not found -FileNotFoundError( - f"Registry for '{source}@{ref}' not found. " - f"Tried: release assets (if tag) and repository .registry/ directory." +RegistryDiscoveryError( + f"Registry file 'registry.toml' not found " + f"as release asset for '{source}@{ref}'" ) +``` + +2. Look for a commit hash, tag, or branch matching the ref (in that order, matching `git`'s lookup order). If a match exists, registry discovery continues in **version-controlled** mode, looking for a registry metadata file in the location specified in the bootstrap file (or in the default location `.registry/`). If no matching ref is found, raise an error indicating registry discovery has failed: -# When attempting branch ref on release-only source -# (Will fail at discovery step - no registry in .registry/ dir) -FileNotFoundError( - f"Registry for '{source}@{ref}' not found at " - f"https://github.com/{org}/{repo}/blob/{ref}/.registry/registry.toml. " - f"This source may only support release tags." +```python +RegistryDiscoveryError( + f"Registry discovery failed, " + f"ref '{source}@{ref}' does not exist" ) ``` -**Metadata section**: +If no registry metadata file can be found, raise an error indicating that the given branch or commit lacks a registry metadata file in the expected location: -All registry files must include a `[_meta]` section with: -- `schema_version`: Registry schema version (currently "1.0") -- `source_repo`: Source repository identifier (e.g., "MODFLOW-ORG/modflow6-examples") -- `source_ref`: Git ref (branch or tag) this registry was built from -- `generated_at`: Timestamp when registry was generated -- `devtools_version`: Version of modflow-devtools used to generate registry +```python +RegistryDiscoveryError( + f"Registry file 'registry.toml' not found " + f"in {registry_path} for '{source}@{ref}'" +) +``` -**Mode-specific attributes** (optional, determine fetch strategy): -- `release_asset`: Name of single zip file containing both registry and models (Mode 2, Option A) -- `registry_asset`: Name of zip file containing registry files (Mode 2, Option B) -- `models_asset`: Name of zip file containing model files (Mode 2, Option B) +If registry metadata file discovery is successful, it is fetched and parsed to determine the location(s) of model input files. -See **Registry Modes** section above for complete examples of metadata for each mode. +**Note**: for repositories combining the version-control and release publication schemes, `modflow-devtools` will discover tagged releases *before* tags as mere refs, therefore the Models API will reflect registry files and model input files published as release assets, not files under version control. -**Validation**: Use `pydantic` for schema validation and versioning +### Registry/model caching -### 3. Cache Structure +A caching approach should support registries for multiple refs simultaneously, enabling fast switching between refs. TBD whether to delegate registry file fetching/caching to Pooch. Model input file fetching/caching can be managed by Pooch as it is already. -**Location**: `~/.cache/modflow-devtools/registries/` (or platform equivalent via Pooch) +Something like the following directory structure should work. -**Directory layout**: ``` ~/.cache/modflow-devtools/ ├── registries/ @@ -224,32 +213,12 @@ See **Registry Modes** section above for complete examples of metadata for each └── ... ``` -**Notes**: -- Keep registries for multiple refs cached simultaneously (tags and branches) -- Cache directory named by ref (tag or branch name) -- Enables fast switching between refs -- Model files themselves cached separately by Pooch - -### 4. Ref Selection Priority -**Default behavior** (when user doesn't specify a ref): -1. **Latest release tag** (if repo publishes releases - e.g., `1.2.3`) -2. **master branch** (fallback for repos without releases) -3. **develop branch** (fallback for repos without master) -**Rationale**: Prefer stable/official tagged releases, gracefully degrade to branches +TODO clean up robot slop below -**Implementation**: -- Check GitHub API for latest release tag -- If no releases found, fall back to `master` branch -- If `master` doesn't exist, fall back to `develop` branch -**Git Ref Support**: -- **Supported**: Release tags (e.g., `v1.2.3`, `1.2.3`), branch names (e.g., `master`, `develop`, `feature/xyz`) -- **Not supported**: Commit SHAs (registries only generated on branch pushes/releases, not per-commit) -- **Error handling**: If user specifies a commit SHA, emit clear error message explaining limitation - -### 5. Sync Mechanism +### Registry synchronization #### Install-Time Behavior - **Best-effort sync** on package install (via `setup.py` or similar) @@ -330,7 +299,7 @@ except ValueError as e: - Merge multiple sources at API level (keep files separate on disk) - **Ref detection**: Use GitHub API to determine if ref is a tag or branch -### 6. Upstream Model Repository Changes +### Registry generation **Required changes in each model repo** (modflow6-examples, modflow6-testmodels, modflow6-largetestmodels): @@ -370,7 +339,7 @@ modflow6-examples/ └── registry.yml ``` -### 7. Registry Architecture & API +### Registry classes #### Core Principle: Separation of Concerns @@ -592,7 +561,7 @@ examples_dev = PoochRegistry("modflow6-examples", "develop") merged = MergedRegistry([examples_stable, examples_dev, testmodels]) ``` -#### Module-Level API (Convenience Layer) +### Module-Level API **Purpose**: Provide convenient access for common use cases @@ -666,12 +635,13 @@ def sync_registry(source: str | None = None, ref: str | None = None, force: bool DEFAULT_REGISTRY = get_registry() # All sources, default refs ``` -### 8. Backward Compatibility (v1.x) +## Migration path + +Ideally, we can avoid breaking existing code, and provide a gentle migration path for users with clear deprecation warnings and/or error messages where necessary. + +For the remainder of the 1.x release series, keep shipping registry metadata with `modflow-devtools` for backwards-compatibility, now with the benefit of explicit model versioning. Allow syncing on demand for access to model updates. Stop shipping registry metadata and begin syncing remote model registry metadata at install time with the release of 2.x. -**Goals**: -- Don't break existing code -- Gentle migration path for users -- Clear deprecation warnings +Then metadata shipped with `modflow-devtools` should be a few KB at most. **Approach**: 1. Continue shipping full registry in v1.x @@ -688,9 +658,9 @@ DEFAULT_REGISTRY = get_registry() # All sources, default refs - Require sync for remote registry access (LocalRegistry unaffected) - Document migration clearly in CHANGELOG -## Implementation Plan +### Implementation Plan -### Phase 1: Foundation (v1.x) +#### Phase 1: Foundation (v1.x) 1. Add bootstrap metadata file 2. Implement registry schema with Pydantic validation 3. Create cache directory structure utilities @@ -698,25 +668,25 @@ DEFAULT_REGISTRY = get_registry() # All sources, default refs 5. Implement branch priority resolution 6. Add CLI subcommands (sync, list, status) -### Phase 2: PoochRegistry Adaptation (v1.x) +#### Phase 2: PoochRegistry Adaptation (v1.x) 1. Modify `PoochRegistry.__init__()` to check cache first 2. Add fallback to bundled registry 3. Implement best-effort sync on import 4. Add deprecation warnings for bundled registry -### Phase 3: Upstream CI (concurrent with Phase 1-2) +#### Phase 3: Upstream CI (concurrent with Phase 1-2) 1. Add `.github/workflows/registry.yml` to each model repo 2. Test registry generation in CI 3. Commit registry files to `.registry/` directories 4. For repos with releases, add registry as release asset -### Phase 4: Testing & Documentation (v1.x) +#### Phase 4: Testing & Documentation (v1.x) 1. Add comprehensive tests for sync mechanism 2. Test network failure scenarios 3. Document new workflow in `models.md` 4. Add migration guide for v2.x -### Phase 5: v2.x Release +#### Phase 5: v2.x Release 1. Remove bundled registry files (keep bootstrap.toml) 2. Make sync required for PoochRegistry 3. Update documentation @@ -747,8 +717,6 @@ DEFAULT_REGISTRY = get_registry() # All sources, default refs 15. **Mixed refs**: Supported naturally via naming scheme - can mix multiple refs of same source 16. **LocalRegistry**: Remains independent, serves different purpose (local development) -## Design Considerations & Risk Mitigation - ### Name Collisions **Risk**: Models from different sources could have identical names. @@ -808,9 +776,3 @@ DEFAULT_REGISTRY = get_registry() # All sources, default refs 4. **Offline mode**: Should we provide an explicit "offline mode" that never tries to sync? 5. **Registry analytics**: Track which models are most frequently accessed? 6. **Naming scheme refinement**: Keep current verbose prefixes (`mf6/example/`, `mf6/test/`) or simplify to `{repo-name}/{subpath}`? - -## Rollout - -For the remainder of the 1.x release series, keep shipping registry metadata with `modflow-devtools` for backwards-compatibility, now with the benefit of explicit model versioning. Allow syncing on demand for access to model updates. Stop shipping registry metadata and begin syncing remote model registry metadata at install time with the release of 2.x. - -Then metadata shipped with `modflow-devtools` should be a few KB at most. \ No newline at end of file From 48a954c2e8a6dca0d366967f5ffdeb7df3ffb1f5 Mon Sep 17 00:00:00 2001 From: wpbonelli Date: Mon, 22 Dec 2025 08:11:58 -0500 Subject: [PATCH 6/7] in decent shape --- docs/md/dev/models.md | 563 +++++++++--------------------------------- 1 file changed, 117 insertions(+), 446 deletions(-) diff --git a/docs/md/dev/models.md b/docs/md/dev/models.md index 7d37491d..d12a58d0 100644 --- a/docs/md/dev/models.md +++ b/docs/md/dev/models.md @@ -4,6 +4,44 @@ This document describes the (re)design of the Models API ([GitHub issue #134](ht This is a living document which will be updated as development proceeds. As the reimplementation nears completion, the scope here will shrink from charting a detailed transition path to simply describing the new design. + + + +- [Background](#background) +- [Objective](#objective) +- [Motivation](#motivation) +- [Overview](#overview) +- [Architecture](#architecture) + - [Bootstrap file](#bootstrap-file) + - [Bootstrap file contents](#bootstrap-file-contents) + - [Sample bootstrap file](#sample-bootstrap-file) + - [Registry files](#registry-files) + - [Registry discovery](#registry-discovery) + - [Model files under version control](#model-files-under-version-control) + - [Model files as release assets](#model-files-as-release-assets) + - [Combining publication schemes](#combining-publication-schemes) + - [Registry discovery procedure](#registry-discovery-procedure) + - [Registry/model caching](#registrymodel-caching) + - [Registry synchronization](#registry-synchronization) + - [Manual sync](#manual-sync) + - [Automatic sync](#automatic-sync) + - [Source model integration](#source-model-integration) + - [Model Addressing](#model-addressing) + - [Registry classes](#registry-classes) + - [Module-Level API](#module-level-api) +- [Migration path](#migration-path) + - [Implementation plan](#implementation-plan) + - [Phase 1: Foundation (v1.x)](#phase-1-foundation-v1x) + - [Phase 2: PoochRegistry Adaptation (v1.x)](#phase-2-poochregistry-adaptation-v1x) + - [Phase 3: Upstream CI (concurrent with Phase 1-2)](#phase-3-upstream-ci-concurrent-with-phase-1-2) + - [Phase 4: Testing & Documentation (v1.x)](#phase-4-testing--documentation-v1x) + - [Phase 5: v2.x Release](#phase-5-v2x-release) +- [Open Questions / Future Enhancements](#open-questions--future-enhancements) + + + + + ## Background Currently each release of `modflow-devtools` is fixed to a specific state of each model repository. It is incumbent on this package's developers to monitor the status of model repositories and, when models are updated, regenerate the registry and release a new version of this package. @@ -213,336 +251,115 @@ Something like the following directory structure should work. └── ... ``` +### Registry synchronization +Delegating registry responsibilities to model repositories entails deferring the loading of registries — `modflow-devtools` will no longer ship with information about exactly which models are available, only where to find model repositories and how they make model input files available. -TODO clean up robot slop below - +The user should be able to manually trigger synchronization. For a smooth experience it should probably happen automatically at opportune times, though. -### Registry synchronization +#### Manual sync -#### Install-Time Behavior -- **Best-effort sync** on package install (via `setup.py` or similar) -- **Warn if unsuccessful** but allow install to succeed -- **Retry on first import** if sync failed during install -- **Clear user messaging**: "Registry sync failed, remote models unavailable. Run `python -m modflow_devtools.models sync` to retry." +Synchronization can be exposed as an [executable module](https://peps.python.org/pep-0338/) and as a [command](https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#creating-executable-scripts). -#### Manual Sync Command +The simplest approach would be a single such script/command, e.g. `python -m modflow_devtools.models.sync` aliased to `sync-models`. It seems ideal to support introspection as well. A full models CLI might include: -**CLI**: `python -m modflow_devtools.models` +- `sync`: synchronize registries for all configured source model repositories, or a specific repo +- `info`: show configured registries and their sync status, or a particular registry's sync status +- `list`: list available models for all registries, or for a particular registry -**Subcommands**: ```bash -# Sync all sources to default refs (latest release tag → master → develop) -python -m modflow_devtools.models sync +# Show configured registries and status +python -m modflow_devtools.models info -# Sync all sources to specific ref (branch or tag) -python -m modflow_devtools.models sync --ref develop -python -m modflow_devtools.models sync --ref v1.2.3 - -# Sync specific source -python -m modflow_devtools.models sync --source modflow6-examples - -# Sync specific source to specific ref -python -m modflow_devtools.models sync --source modflow6-examples --ref develop -python -m modflow_devtools.models sync --source modflow6-examples --ref v1.2.3 +# Sync all sources to configured refs +python -m modflow_devtools.models sync # Force re-download even if cached python -m modflow_devtools.models sync --force -# List available registries and their status -python -m modflow_devtools.models list +# For a repo publishing models via releases +python -m modflow_devtools.models sync --repo MODFLOW-ORG/modflow6-examples --ref current -# Show sync status -python -m modflow_devtools.models status +# For a repo with models under version control +python -m modflow_devtools.models sync --repo MODFLOW-ORG/modflow6-testmodels --ref develop +python -m modflow_devtools.models sync --repo MODFLOW-ORG/modflow6-testmodels --ref f3df630 # commit hash works too ``` -**Error handling for unsupported refs**: +Or via CLI commands: + ```bash -# Commit SHA not supported - clear error message -python -m modflow_devtools.models sync --ref abc123def -# Error: Commit SHAs are not supported. Registries are only generated for branches and release tags. -# Please use a branch name (e.g., 'master', 'develop') or release tag (e.g., 'v1.2.3'). +models info +models sync ``` -**Programmatic API**: -```python -from modflow_devtools.models import sync_registry, get_registry - -# Sync and use default (latest release tag → master → develop) -sync_registry() -registry = get_registry() +Perhaps leading with a `models` command namespace is too generic, and we need e.g. a leading `mf` namespace on all commands exposed by `modflow-devtools`: -# Sync to specific ref (branch or tag) -sync_registry(ref="develop") -sync_registry(ref="v1.2.3") - -# Use specific ref without syncing -registry = get_registry(ref="develop") # uses cached, syncs if missing -registry = get_registry(ref="v1.2.3") # uses cached release tag - -# Use specific source and ref -registry = get_registry(source="modflow6-examples", ref="develop") -registry = get_registry(source="modflow6-examples", ref="v1.2.3") - -# Error on commit SHA -try: - registry = get_registry(ref="abc123def") -except ValueError as e: - print(e) # "Commit SHAs are not supported..." +```bash +mf models info +mf models sync ``` -#### Sync Implementation -- **For release tags**: Download registry files from GitHub release assets -- **For branches**: Download registry files from GitHub raw URLs (e.g., `https://raw.githubusercontent.com/MODFLOW-ORG/modflow6-examples/{branch}/.registry/registry.toml`) -- Validate schema version and structure -- Cache to local directory (named by ref - tag or branch) -- Merge multiple sources at API level (keep files separate on disk) -- **Ref detection**: Use GitHub API to determine if ref is a tag or branch +#### Automatic sync -### Registry generation +At install time, `modflow-devtools` can load the bootstrap file and attempt to sync to all configured repositories/registries. The install should not fail if registry sync fails (due either to network errors or misconfiguration), however — an informative warning can be shown, and sync retried on subsequent imports and/or manually (see below). -**Required changes in each model repo** (modflow6-examples, modflow6-testmodels, modflow6-largetestmodels): +Synchronization involves: -#### CI Workflow -**File**: `.github/workflows/registry.yml` +- Loading the bootstrap file +- Discovering/validating remote registries +- Caching registries locally -**Trigger**: Push to master/develop branches, or release tag creation +### Source model integration -**Steps**: -1. Install `modflow-devtools` (provides registry generation machinery) -2. Run registry generation: +Required steps in source model repositories include: + +- Install `modflow-devtools` (provides registry generation machinery) +- Generate registries ```bash python -m modflow_devtools.make_registry \ --path . \ --output .registry \ --url ``` -3. Commit registry files to `.registry/` directory (for branches) -4. For release tags: Attach registry files as release assets +- Commit registry files to `.registry/` directory (for version-controlled model repositories) or post them as release assets (for repositories publishing releases) -**Notes**: -- Registry generation machinery remains in `modflow-devtools` -- Model repos consume it as a dependency -- Keeps single source of truth for registry format -#### Directory Structure -``` -modflow6-examples/ -├── .registry/ -│ ├── registry.toml -│ ├── models.toml -│ └── examples.toml -├── examples/ -│ └── ... -└── .github/ - └── workflows/ - └── registry.yml -``` - -### Registry classes - -#### Core Principle: Separation of Concerns - -- **`PoochRegistry`**: Single source, single ref - knows nothing about other sources -- **`MergedRegistry`**: Pure compositor - just merges existing registries, no construction logic -- **Module-level functions**: Handle sync, construction, and convenience APIs - -#### Model Naming Convention +### Model Addressing **Format**: `{source}@{ref}/{subpath}` -**Components**: +Components include: + - `source`: Repository identifier (e.g., `modflow6-examples`, `modflow6-testmodels`) - `ref`: Git ref (branch or tag, e.g., `v1.2.3`, `master`, `develop`) - `subpath`: Relative path within repo to model directory -**Examples**: +The model directory name, i.e. the rightmost element in the `subpath`, is presumed to be the model name. + +For example: + - `modflow6-examples@v1.2.3/ex-gwf-twri` - `modflow6-testmodels@develop/mf6/test001a_Tharmonic` - `modflow6-largetestmodels@master/prudic2004t2` -**Benefits**: -- Guarantees no name collisions (unique per source + ref + path) -- Makes model provenance explicit to users -- Allows mixing multiple refs of same source -- Simplifies cache key generation - -#### PoochRegistry (Single Source) - -**Purpose**: Represent a single source repository at a specific ref +Benefits of this approach: -**Constructor**: Takes `source` (repo name) and `ref` (branch/tag) - -```python -class PoochRegistry(ModelRegistry): - def __init__(self, source: str, ref: str | None = None, cache_path: PathLike | None = None): - """Create registry for a single source repository - - Args: - source: Source repository name (e.g., "modflow6-examples") - ref: Git ref - branch name or release tag - (default: latest release tag → master → develop) - Commit SHAs not supported. - cache_path: Override default cache location - - Raises: - ValueError: If ref is a commit SHA - FileNotFoundError: If registry not cached and sync fails - """ - self._source = source - self._ref = self._resolve_ref(ref) # Applies default priority - self._cache_path = cache_path or self._default_cache_path() - self._load() # Load from cache, auto-sync if missing - - @property - def source(self) -> str: - """Source repository name""" - return self._source - - @property - def ref(self) -> str: - """Git ref (branch or tag)""" - return self._ref - - def sync(self, force: bool = False) -> None: - """Sync this registry from upstream - - Automatically discovers registry location and mode: - 1. If ref is a tag: Try release assets first - 2. Fallback: Try .registry/ directory in repository - 3. After loading: Inspect metadata to determine fetch strategy - - Args: - force: Re-download even if cached - - Raises: - FileNotFoundError: If registry not found in either location - """ - # Try release assets if ref is a tag - if self._is_tag(self._ref): - try: - self._sync_from_release_assets() - self._setup_pooch() # Configure based on metadata - return - except ReleaseNotFound: - pass # Fall through to repository - - # Try .registry/ directory in repository - try: - self._sync_from_repository() - self._setup_pooch() # Configure based on metadata - except FileNotFoundError: - raise FileNotFoundError( - f"Registry for '{self._source}@{self._ref}' not found. " - f"Tried: release assets (if tag) and repository .registry/ directory." - ) - - def _setup_pooch(self) -> None: - """Configure Pooch based on registry metadata (mode detection)""" - meta = self._meta - - if "release_asset" in meta: - # Mode 2, Option A: Single zip with registry + models - self._fetch_mode = "single_zip" - self._asset_name = meta["release_asset"] - - elif "models_asset" in meta: - # Mode 2, Option B: Separate registry and model assets - self._fetch_mode = "models_zip" - self._asset_name = meta["models_asset"] - - else: - # Mode 1: In-repo individual files - self._fetch_mode = "individual_files" - # URLs already in registry from make_registry.py - - def is_synced(self) -> bool: - """Check if registry is cached for this source/ref""" - ... - - # Inherited from ModelRegistry abstract class - @property - def files(self) -> dict: - """Map of file names to file info (with source@ref prefix)""" - ... - - @property - def models(self) -> dict: - """Map of model names to file lists (with source@ref prefix)""" - ... - - @property - def examples(self) -> dict: - """Map of example names to model lists (with source@ref prefix)""" - ... -``` +- Guarantees no name/cache collisions (unique per source + ref + path) +- Model provenance is explicit to users +- Allows multiple refs from same source -**Key changes from current**: -- Loads from cache by default (not package resources) -- Auto-syncs if cache missing (best-effort on first access) -- All keys prefixed with `{source}@{ref}/` in returned dicts +### Registry classes -#### MergedRegistry (Compositor) +`PoochRegistry` is currently associated with a single state of a single repository. This can continue. Introduce a few properties to (e.g. `source` and `ref`) to make the model source and version explicit. -**Purpose**: Merge multiple `ModelRegistry` instances into unified API +`PoochRegistry` should be immutable — to synchronize to a new model source state, create a new one. -**Constructor**: Takes list of pre-constructed registry instances +Introduce a `MergedRegistry` compositor to merge multiple `PoochRegistry` instances under the same `ModelRegistry` API. The initializer can simply accept a list of pre-constructed `PoochRegistry` instances, and expose a list or dictionary of the registries of which it consists. Properties inherited from `ModelRegistry` (`files`, `models`, `examples`) can return merged views. -```python -class MergedRegistry(ModelRegistry): - def __init__(self, registries: list[ModelRegistry]): - """Merge multiple registries into unified API - - Args: - registries: List of ModelRegistry instances (typically PoochRegistry) - Caller is responsible for constructing these with desired - sources and refs. - - Note: - This class is a pure compositor - it knows nothing about sources, - refs, syncing, or construction. All that logic happens before - MergedRegistry is created. - """ - self._registries = list(registries) - - @property - def registries(self) -> list[ModelRegistry]: - """The underlying registries being merged""" - return list(self._registries) # Return copy - - # Inherited from ModelRegistry - merge results from all registries - @property - def files(self) -> dict: - """Merged files from all registries""" - merged = {} - for registry in self._registries: - merged.update(registry.files) - return merged - - @property - def models(self) -> dict: - """Merged models from all registries""" - merged = {} - for registry in self._registries: - merged.update(registry.models) - return merged - - @property - def examples(self) -> dict: - """Merged examples from all registries""" - merged = {} - for registry in self._registries: - merged.update(registry.examples) - return merged -``` +Handle synchronization, `MergedRegistry` construction, and similar concerns at the module (i.e. higher) level. Registries don't need to concern themselves with this sort of thing. -**Why no factory methods?** -- Construction is trivial: `MergedRegistry([reg1, reg2])` -- Users can easily create new instances when refs change -- Keeps the class focused and simple -- Avoids coupling MergedRegistry to PoochRegistry +Some tentative usage examples: -**Usage examples**: ```python # Create individual registries examples_v1 = PoochRegistry("modflow6-examples", "v1.2.3") @@ -561,106 +378,33 @@ examples_dev = PoochRegistry("modflow6-examples", "develop") merged = MergedRegistry([examples_stable, examples_dev, testmodels]) ``` +`LocalRegistry` is unaffected by all this, as it suits a different use case largely aimed at developers. Consider renaming it e.g. to `DeveloperRegistry`. + ### Module-Level API -**Purpose**: Provide convenient access for common use cases +Provide convenient APIs for common use cases, like synchronizing to a particular source or to all known sources, introspecting sync status, etc. -```python -# Module: modflow_devtools.models - -def get_registry( - source: str | None = None, - ref: str | None = None, - sources: dict[str, str] | None = None -) -> ModelRegistry: - """Get a registry (single source or merged) - - Args: - source: Single source name (returns PoochRegistry) - ref: Git ref to use (applies to single source or all sources) - sources: Dict mapping source names to refs for mixed-ref merged registry - e.g., {"modflow6-examples": "v1.2.3", "modflow6-testmodels": "develop"} - - Returns: - PoochRegistry if source specified, otherwise MergedRegistry - - Examples: - # Single source - reg = get_registry(source="modflow6-examples", ref="v1.2.3") - - # All sources, same ref - reg = get_registry(ref="develop") - - # All sources, default refs (latest release → master → develop) - reg = get_registry() - - # All sources, mixed refs - reg = get_registry(sources={ - "modflow6-examples": "v1.2.3", - "modflow6-testmodels": "develop" - }) - """ - if source: - return PoochRegistry(source, ref) - - if sources: - registries = [PoochRegistry(src, r) for src, r in sources.items()] - else: - # Load all from bootstrap, apply same ref to all - bootstrap = load_bootstrap() - registries = [PoochRegistry(src, ref) for src in bootstrap.sources.keys()] - - return MergedRegistry(registries) - - -def sync_registry(source: str | None = None, ref: str | None = None, force: bool = False) -> None: - """Sync registry from upstream - - Args: - source: Specific source to sync (default: all sources from bootstrap) - ref: Git ref to sync (default: latest release → master → develop) - force: Force re-download even if cached - """ - if source: - registry = PoochRegistry(source, ref) - registry.sync(force=force) - else: - bootstrap = load_bootstrap() - for src in bootstrap.sources.keys(): - registry = PoochRegistry(src, ref) - registry.sync(force=force) - - -# DEFAULT_REGISTRY is now a MergedRegistry -DEFAULT_REGISTRY = get_registry() # All sources, default refs -``` +Expose as `DEFAULT_REGISTRY` a `MergedRegistry` with all sources configured in the bootstrap file. + +This will break any code checking `isinstance(DEFAULT_REGISTRY, PoochRegistry)`, but it's unlikely anyone is doing that. ## Migration path Ideally, we can avoid breaking existing code, and provide a gentle migration path for users with clear deprecation warnings and/or error messages where necessary. -For the remainder of the 1.x release series, keep shipping registry metadata with `modflow-devtools` for backwards-compatibility, now with the benefit of explicit model versioning. Allow syncing on demand for access to model updates. Stop shipping registry metadata and begin syncing remote model registry metadata at install time with the release of 2.x. - -Then metadata shipped with `modflow-devtools` should be a few KB at most. +For the remainder of the 1.x release series, keep shipping registry metadata with `modflow-devtools` for backwards-compatibility, now with the benefit of explicit model versioning. Allow syncing on demand for access to model updates. Stop shipping registry metadata and begin syncing remote model registry metadata at install time with the release of 2.x, at which point metadata shipped with `modflow-devtools` should be a few KB at most. -**Approach**: -1. Continue shipping full registry in v1.x -2. Add sync functionality as optional enhancement -3. Emit deprecation warning on import: - ``` - DeprecationWarning: Bundled registry is deprecated and will be removed in v2.0. - Use `python -m modflow_devtools.models sync` to download the latest registry. - ``` -4. Provide migration guide in docs +For 1.x, show a deprecation warning on import: -**Breaking changes in v2.x**: -- Remove bundled registry files (except bootstrap.toml) -- Require sync for remote registry access (LocalRegistry unaffected) -- Document migration clearly in CHANGELOG +``` +DeprecationWarning: Bundled registry is deprecated and will be removed in v2.0. +Use `python -m modflow_devtools.models sync` to download the latest registry. +``` -### Implementation Plan +### Implementation plan #### Phase 1: Foundation (v1.x) + 1. Add bootstrap metadata file 2. Implement registry schema with Pydantic validation 3. Create cache directory structure utilities @@ -669,110 +413,37 @@ Then metadata shipped with `modflow-devtools` should be a few KB at most. 6. Add CLI subcommands (sync, list, status) #### Phase 2: PoochRegistry Adaptation (v1.x) -1. Modify `PoochRegistry.__init__()` to check cache first + +1. Modify `PoochRegistry` to check cache first 2. Add fallback to bundled registry 3. Implement best-effort sync on import 4. Add deprecation warnings for bundled registry #### Phase 3: Upstream CI (concurrent with Phase 1-2) + 1. Add `.github/workflows/registry.yml` to each model repo 2. Test registry generation in CI 3. Commit registry files to `.registry/` directories 4. For repos with releases, add registry as release asset #### Phase 4: Testing & Documentation (v1.x) + 1. Add comprehensive tests for sync mechanism 2. Test network failure scenarios 3. Document new workflow in `models.md` 4. Add migration guide for v2.x #### Phase 5: v2.x Release + 1. Remove bundled registry files (keep bootstrap.toml) 2. Make sync required for PoochRegistry 3. Update documentation 4. Release notes with clear migration instructions -## Key Design Decisions - -1. **Install-time sync**: Best-effort, warn on failure, allow install to proceed -2. **Registry location**: `.registry/` directory on each branch in model repos; also as release assets for tagged releases -3. **Bootstrap format**: Minimal TOML with just repo identifiers - no hints about location or fetch strategy -4. **Registry modes**: Self-describing via metadata attributes - - Mode 1 (in-repo): No asset attributes → individual file fetching - - Mode 2 (release-only): `release_asset`, `registry_asset`, or `models_asset` → zip fetching - - Mode discovered automatically during sync -5. **Multi-ref caching**: Support simultaneous caching of multiple refs (tags and branches) -6. **Schema versioning**: Use Pydantic, include `_meta` section in registries -7. **Ref priority**: Latest release tag → master branch → develop branch (when user doesn't specify) -8. **Ref support**: Branch names and release tags supported; commit SHAs not supported (with clear error message) -9. **CLI parameter**: Use `--ref` (not `--branch`) to clarify support for both tags and branches -10. **Transition**: Optional in v1.x with deprecation warning, required in v2.x -11. **Registry architecture**: Clear separation of concerns - - `PoochRegistry`: Single source, single ref - no knowledge of other sources - - `MergedRegistry`: Pure compositor - takes pre-built registries, no construction logic - - Module functions: Handle sync, construction, convenience APIs -12. **Model naming**: `{source}@{ref}/{subpath}` format guarantees collision-free names and explicit provenance -13. **Registry merging**: Keep separate on disk and in separate `PoochRegistry` instances, merge via `MergedRegistry` -14. **No factory methods**: `MergedRegistry` construction is trivial, users create new instances directly -15. **Mixed refs**: Supported naturally via naming scheme - can mix multiple refs of same source -16. **LocalRegistry**: Remains independent, serves different purpose (local development) - -### Name Collisions -**Risk**: Models from different sources could have identical names. - -**Mitigation**: Systematic naming scheme `{source}@{ref}/{subpath}` guarantees uniqueness: -- Each source has distinct identifier -- Refs are included in name -- Subpaths are unique within a source - -**Example**: `modflow6-examples@v1.2.3/ex-gwf-twri` cannot collide with `modflow6-testmodels@develop/ex-gwf-twri` - -### Partial Sync State -**Risk**: User syncs some sources but not others, leading to incomplete `MergedRegistry`. - -**Mitigation**: -- `MergedRegistry` is transparent - only merges what it's given -- Module-level `get_registry()` handles ensuring sources are synced -- `PoochRegistry` auto-syncs on first access (best-effort) -- Clear error messages if sync fails - -### Performance -**Risk**: Loading multiple registry files could be slow. - -**Analysis**: Not a concern - TOML files load instantly (even 1.7MB registry is trivial). Model files download lazily via Pooch only when accessed. - -**Decision**: No lazy loading needed for registries themselves. - -### Error Propagation -**Risk**: One source failing to sync could break entire `MergedRegistry`. - -**Mitigation**: -- `PoochRegistry` constructor fails fast if sync fails -- Caller (module functions) can handle errors before constructing `MergedRegistry` -- `MergedRegistry` itself is simple - no error handling needed (operates on valid registries) - -### Backward Compatibility -**Risk**: Changing `DEFAULT_REGISTRY` from `PoochRegistry` to `MergedRegistry` breaks code checking `isinstance(DEFAULT_REGISTRY, PoochRegistry)`. - -**Mitigation**: -- Both implement `ModelRegistry` abstract class -- API is identical for common operations -- Breaking change acceptable for v2.x with clear migration guide -- v1.x maintains current behavior with deprecation warnings - -### Cache Invalidation -**Risk**: Registry instance doesn't reflect newly synced data. - -**Mitigation**: -- Document that registries are immutable per ref -- To use new data, create new instance: `get_registry(ref="new-ref")` -- Construction is cheap (just loading TOML), so recreating is fine - ## Open Questions / Future Enhancements -1. **Registry compression**: Should we gzip registry files for faster downloads? -2. **Partial registry updates**: Could we diff registries and download only changes? -3. **Registry CDN**: Should we consider hosting registries on a CDN for faster access? -4. **Offline mode**: Should we provide an explicit "offline mode" that never tries to sync? -5. **Registry analytics**: Track which models are most frequently accessed? -6. **Naming scheme refinement**: Keep current verbose prefixes (`mf6/example/`, `mf6/test/`) or simplify to `{repo-name}/{subpath}`? +1. **Registry compression**: Zip registry files for faster downloads? +2. **Partial registry updates**: Diff registries and download only changes? +3. **Registry CDN**: Consider hosting registries somewhere for faster access? +4. **Offline mode**: Provide an explicit "offline mode" that never tries to sync? +5. **Registry analytics**: Track which models/examples are most frequently accessed? From bc78a74565f4112f1f06c884e97f880401f7cd5e Mon Sep 17 00:00:00 2001 From: wpbonelli Date: Mon, 22 Dec 2025 08:32:16 -0500 Subject: [PATCH 7/7] note --- docs/md/dev/models.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/md/dev/models.md b/docs/md/dev/models.md index d12a58d0..b3d0d49a 100644 --- a/docs/md/dev/models.md +++ b/docs/md/dev/models.md @@ -119,6 +119,8 @@ refs = [ ] ``` +Note: The bootstrap refs list indicates default refs to sync at install time. Users can request synchronization to any valid git ref (branch, tag, or commit hash) via the CLI or API. + ### Registry files There are currently three separate registry files: