Skip to content

breaking: reading based on new index structure#288

Draft
consideRatio wants to merge 16 commits intosensmetry:mainfrom
consideRatio:pr/client-read-new-index
Draft

breaking: reading based on new index structure#288
consideRatio wants to merge 16 commits intosensmetry:mainfrom
consideRatio:pr/client-read-new-index

Conversation

@consideRatio
Copy link
Copy Markdown
Collaborator

@consideRatio consideRatio commented Apr 15, 2026

The client now reads indexes through the spec documented in
`docs/src/index-protocol.md` and `docs/src/index-api-protocol.md`.

Wire-format changes (reads):
- `<index_root>/.well-known/sysand-index.json` is fetched first to
  resolve `index_root` / `api_root`. 404 is not an error (both roots
  default to the discovery root); other non-2xx is a hard error.
- `<index_root>/index.json` narrows to `{"projects": [{"iri": …}]}`.
- `<…>/versions.json` replaces the previous `versions.txt` newline
  list. Each entry carries `version`, `usage`, `project_digest`,
  `kpar_size`, `kpar_digest`.
- Per-version layout becomes `<…>/<version>/.project.json`,
  `.meta.json`, `project.kpar` (replacing the old
  `<…>/<version>.kpar/…` layout).
- IRI → path: `pkg:sysand/<pub>/<name>` → `<pub>/<name>/`; all other
  IRIs → `_iri/<sha256(normalized_iri)>/`.
- Non-canonical `pkg:sysand/…` IRIs are now hard-rejected with
  `SysandPurlError::NotNormalized`; `InterchangeProjectUsageRaw::validate`
  runs the same rejection, so every `.project.json` consumer is
  affected (not just the index client).
- `project_digest` is verified before `.project.json` / `.meta.json`
  are exposed; `kpar_digest` is verified during streamed download and
  never renamed into place on mismatch.

Wire-format changes (publish):
- `sysand publish` now performs mandatory well-known discovery via the
  configured `HTTPAuthentication` policy and posts to
  `<api_root>/v1/upload` (leading slash dropped from the old
  `/api/v1/upload`). Auth-gated well-known URIs are supported — RFC
  8615 takes no position on authentication.

On-disk format changes:
- `.meta.json`'s `created` field now serializes via
  `to_rfc3339_opts(SecondsFormat::Nanos, true)`, giving nanosecond
  precision and a `Z` suffix. Docs/protocol update tracked as a
  follow-up.
- Lockfiles may now carry `remote_kpar_digest` on
  `Source::RemoteKpar`. This is needed to preserve the raw-archive
  integrity tripwire across `lock` → `sync`: once `lock` has read
  `versions.json`, `sync` should be able to verify the downloaded
  `project.kpar` bytes against the lockfile alone, without re-querying
  the index. Relying only on the canonical `project_digest` would miss
  repacks/tampering that leave the extracted project content unchanged.

Public Rust API:
- New modules: `env::discovery`, `env::index`, `project::index_entry`,
  `purl`. Removed: `env::reqwest_http::HTTPEnvironmentAsync`.
- `CombinedResolver` / `CombinedProjectStorage` rename `Registry` →
  `Index` across types and variants.
- `PublishError`: `InvalidIndexUrl` → split into
  `InvalidDiscoveryRoot` / `InvalidApiRoot`; `validate_endpoint_url_shape`
  takes a new `EndpointKind` argument.
- `commands::publish::do_publish` signature changed;
  `prepare_publish_payload`, `validate_endpoint_url_shape`,
  `PublishPreparation`, `EndpointKind` are newly `pub`.
- `ProjectRead`/`ProjectReadAsync` contract: wrappers must forward
  `checksum_canonical_hex{,_async}` and related methods explicitly.
  The `ProjectRead` derive macro auto-forwards the expanded set; the
  in-tree `AsAsyncProject` wrapper forwards all six affected methods,
  and the derive now qualifies `CanonicalizationError` explicitly so
  downstream crates do not need a caller-side import for the generated
  code to compile.
- `ReqwestKparDownloadedProject` gains `new`, `is_downloaded`,
  `ensure_downloaded_verified`, and `with_expected_sha256_hex`; new
  `DigestMismatch` error variant. The type now tracks
  verified-vs-unverified download state so that an earlier unverified
  call cannot silently short-circuit a later verified call, and
  lockfile-driven `sync` can enforce the recorded `kpar_digest`.
- `InterchangeProjectValidationError::MalformedSysandPurl` is new;
  `CanonicalizationError::map_project_read` is new.
- `core/Cargo.toml` adds `idna`, enables `tokio/sync`.

Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
A few prompts I've used to self-review, besides scanning it all myself

Task three Staff SWE to review the HEAD commit separately, the commit represents changes in a PR not yet merged. Let one focus on avoiding bugs etc, let another focus on architecture decisions, and another one on code comprehension (comments, naming, etc). Put these reviews into an single markdown file (review1.md), which is then passed for review of the review by codex as another staff SWE, which you task to emit another markdown file (review2.md). When that is done, you surface things to me.

Task two subagents acting as staff SWEs to review the changes since origin/main on this PR branch for readability, code comprehension, and sustainbility of maintenance, and such. I really care about keeping things reasonable - things shouldn't be implemented too defensively if it leads to significant complexity for example, and if we cover for an edge case we want to think its reasonable its going to happen etc. I want all names to be very thoughtfully decided, we can change anything introduced since origin/main.

Let one subagent be your own, and another delegate the task to codex.

Then finally let both agents come to an agreement on a proposal in some way that you determine as an orchestrator. It is okay if somewhat nitty changes are included as long as they are related to whats part of this PR.

Be an orchestrator and task two subagents, both acting as staff SWE reviewing
this feature branch on top of origin/main. Let one be run by codex gpt 5.4,
and the other by Opus 4.7 xhigh.

Instead of just settling for their individual conclusions, facilitate them to
come to a collective conclusion on a final review presented to me with clear
actionable suggestions.

Be an orchestrator and task two subagents, both acting as staff SWE reviewing this feature branch on top of origin/main. Let one be run by codex gpt 5.4, and the other by Opus 4.7 xhigh.

Instead of just settling for their individual conclusions, facilitate them to come to a collective conclusion on a final review presented to me with clear actionable suggestions.

A lot of work has been done before decisions in docs/src/index-protocol.md was locked in, and there is work to do in order to adhere to it. Let the review focus on what should be done towards that.

Following this, I want you to task the same agents, either with reset memory or retained based on what you think is best as an orchestrator, to plan the details and reach agreements on that.

Finally, I want you to task a staff SWE to review that implementation plan, and then present to me.

Review this PR's diff against origin/main on five axes: (1) is any fact stated in more than one place? (2) does naming match the terms introduced by the spec / sibling modules? (3) does every field, lock, comment, and cleanup branch earn its keep? (4) does new code follow existing conventions (test layout, fan-in primitives, error variants)? (5) does anything change an observable format or public API without saying so? Flag findings, don't auto-fix.

Practically I want you to do the review by being an orchestrator and task two subagents, both acting as staff SWE. Let one be run by codex gpt 5.4, and the other by Opus 4.7 xhigh.

Instead of just settling for their individual conclusions, facilitate them to come to a collective conclusion on a final review presented to me with clear actionable suggestions.

@consideRatio consideRatio marked this pull request as draft April 15, 2026 13:33
@consideRatio consideRatio force-pushed the pr/client-read-new-index branch 3 times, most recently from 4f04d45 to 4b14059 Compare April 16, 2026 07:05
@consideRatio consideRatio force-pushed the pr/client-read-new-index branch 12 times, most recently from ef6f5e0 to 3d43a8f Compare April 20, 2026 08:38
@consideRatio consideRatio force-pushed the pr/client-read-new-index branch 3 times, most recently from de29a9c to 6ee8561 Compare April 20, 2026 11:40
The client now reads indexes through the spec documented in
`docs/src/index-protocol.md` and `docs/src/index-api-protocol.md`.

Wire-format changes (reads):
- `<index_root>/.well-known/sysand-index.json` is fetched first to
  resolve `index_root` / `api_root`. 404 is not an error (both roots
  default to the discovery root); other non-2xx is a hard error.
- `<index_root>/index.json` narrows to `{"projects": [{"iri": …}]}`.
- `<…>/versions.json` replaces the previous `versions.txt` newline
  list. Each entry carries `version`, `usage`, `project_digest`,
  `kpar_size`, `kpar_digest`.
- Per-version layout becomes `<…>/<version>/.project.json`,
  `.meta.json`, `project.kpar` (replacing the old
  `<…>/<version>.kpar/…` layout).
- IRI → path: `pkg:sysand/<pub>/<name>` → `<pub>/<name>/`; all other
  IRIs → `_iri/<sha256(normalized_iri)>/`.
- Non-canonical `pkg:sysand/…` IRIs are now hard-rejected with
  `SysandPurlError::NotNormalized`; `InterchangeProjectUsageRaw::validate`
  runs the same rejection, so every `.project.json` consumer is
  affected (not just the index client).
- `project_digest` is verified before `.project.json` / `.meta.json`
  are exposed; `kpar_digest` is verified during streamed download and
  never renamed into place on mismatch.

Wire-format changes (publish):
- `sysand publish` now performs mandatory well-known discovery via the
  configured `HTTPAuthentication` policy and posts to
  `<api_root>/v1/upload` (leading slash dropped from the old
  `/api/v1/upload`). Auth-gated well-known URIs are supported — RFC
  8615 takes no position on authentication.

On-disk format changes:
- `.meta.json`'s `created` field now serializes via
  `to_rfc3339_opts(SecondsFormat::Nanos, true)`, giving nanosecond
  precision and a `Z` suffix. Docs/protocol update tracked as a
  follow-up.
- Lockfiles may now carry `remote_kpar_digest` on
  `Source::RemoteKpar`. This is needed to preserve the raw-archive
  integrity tripwire across `lock` → `sync`: once `lock` has read
  `versions.json`, `sync` should be able to verify the downloaded
  `project.kpar` bytes against the lockfile alone, without re-querying
  the index. Relying only on the canonical `project_digest` would miss
  repacks/tampering that leave the extracted project content unchanged.

Public Rust API:
- New modules: `env::discovery`, `env::index`, `project::index_entry`,
  `purl`. Removed: `env::reqwest_http::HTTPEnvironmentAsync`.
- `CombinedResolver` / `CombinedProjectStorage` rename `Registry` →
  `Index` across types and variants.
- `PublishError`: `InvalidIndexUrl` → split into
  `InvalidDiscoveryRoot` / `InvalidApiRoot`; `validate_endpoint_url_shape`
  takes a new `EndpointKind` argument.
- `commands::publish::do_publish` signature changed;
  `prepare_publish_payload`, `validate_endpoint_url_shape`,
  `PublishPreparation`, `EndpointKind` are newly `pub`.
- `ProjectRead`/`ProjectReadAsync` contract: wrappers must forward
  `checksum_canonical_hex{,_async}` and related methods explicitly.
  The `ProjectRead` derive macro auto-forwards the expanded set; the
  in-tree `AsAsyncProject` wrapper forwards all six affected methods,
  and the derive now qualifies `CanonicalizationError` explicitly so
  downstream crates do not need a caller-side import for the generated
  code to compile.
- `ReqwestKparDownloadedProject` gains `new`, `is_downloaded`,
  `ensure_downloaded_verified`, and `with_expected_sha256_hex`; new
  `DigestMismatch` error variant. The type now tracks
  verified-vs-unverified download state so that an earlier unverified
  call cannot silently short-circuit a later verified call, and
  lockfile-driven `sync` can enforce the recorded `kpar_digest`.
- `InterchangeProjectValidationError::MalformedSysandPurl` is new;
  `CanonicalizationError::map_project_read` is new.
- `core/Cargo.toml` adds `idna`, enables `tokio/sync`.

Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
@consideRatio consideRatio force-pushed the pr/client-read-new-index branch from 6ee8561 to de55a9e Compare April 20, 2026 12:08
@consideRatio consideRatio marked this pull request as ready for review April 20, 2026 12:17
@andrius-puksta-sensmetry
Copy link
Copy Markdown
Collaborator

Impressive...

@consideRatio

This comment was marked as resolved.

Comment thread docs/src/index-protocol.md Outdated
Comment thread docs/src/index-protocol.md Outdated
Comment thread docs/src/index-protocol.md Outdated
Comment thread docs/src/index-protocol.md
Comment thread docs/src/index-protocol.md Outdated
Comment thread docs/src/index-protocol.md Outdated
Comment thread docs/src/index-protocol.md
Comment thread docs/src/index-protocol.md Outdated
Comment thread docs/src/index-protocol.md Outdated
Comment thread docs/src/index-protocol.md Outdated
Copy link
Copy Markdown
Collaborator

@andrius-puksta-sensmetry andrius-puksta-sensmetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a start...

Comment thread core/src/commands/lock.rs Outdated
Comment thread core/src/commands/lock.rs Outdated
Comment thread core/src/commands/lock.rs Outdated
Comment thread core/src/commands/lock.rs Outdated
Comment thread core/src/commands/lock.rs Outdated
Comment thread core/src/lock.rs Outdated
Comment thread core/src/lock.rs Outdated
Comment thread core/src/lock.rs Outdated
Comment thread core/src/lock.rs Outdated
Comment thread core/src/purl.rs Outdated
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Comment thread docs/src/index-protocol.md Outdated
Comment thread docs/src/index-protocol.md
Comment thread docs/src/index-protocol.md Outdated
Comment thread docs/src/index-protocol.md Outdated
Comment thread docs/src/index-protocol.md Outdated
Comment thread docs/src/index-protocol.md Outdated
Comment thread docs/src/index-protocol.md Outdated
@consideRatio consideRatio marked this pull request as draft April 23, 2026 14:04
consideRatio and others added 4 commits April 23, 2026 16:08
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell@sensmetry.com>
@consideRatio consideRatio force-pushed the pr/client-read-new-index branch from 5aca1a1 to c1470a7 Compare April 23, 2026 21:30
@andrius-puksta-sensmetry
Copy link
Copy Markdown
Collaborator

andrius-puksta-sensmetry commented Apr 24, 2026

Each entry carries version, usage, project_digest,
kpar_size, kpar_digest.

Why have kpar_size? The client will know its size if it downloads the actual kpar, and kpar_digest allows detecting changes.

EDIT by Erik: discussed during the meet, we will include it, but it is beyond what we need given the current functionality etc. RemoteKpar included it, and that is the key reason it was considered initially. For the new IndexKpar lockfile entry, we could remove it and stop providing it in the versions.json - we decided to let it stay around though.

EDIT by Erik: IndexKpar now used with size and digest, RemoteKpar is left untouched.

Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Copy link
Copy Markdown
Collaborator

@andrius-puksta-sensmetry andrius-puksta-sensmetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part 2

Comment thread core/src/commands/lock.rs Outdated
Comment thread core/src/commands/lock.rs Outdated
Comment thread core/src/commands/publish.rs Outdated
Comment thread core/src/commands/publish.rs Outdated
.map_err(|source| PublishError::InvalidApiRoot {
url: api_root.as_str().into(),
reason: format!("failed to compose upload URL: {source}"),
})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
})
let mut with_slash = crate::env::discovery::with_trailing_slash(api_root.clone());
with_slash.path_segments_mut().unwrap().push(UPLOAD_ENDPOINT_PATH);
with_slash

unwrap() is used because error is not possible for http(s) URLs due to their structure, and api_root is required to be http(s).

Copy link
Copy Markdown
Collaborator Author

@consideRatio consideRatio Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go with this:

pub fn build_upload_url(api_root: &Url) -> Result<Url, PublishError> {
    // The `v1/upload` suffix rejection is part of shape validation.
    validate_endpoint_url_shape(api_root, EndpointKind::ApiRoot)?;

    Ok(crate::env::discovery::with_trailing_slash(api_root.clone())
        .join(UPLOAD_ENDPOINT_PATH)
        .unwrap())
}

UPLOAD_ENDPOINT_PATH includes multiple segments, but use of push() assumes its one, so the v1/upload will be urlencoded and not added as two separate path segments.

Comment thread core/src/env/local_directory/utils.rs Outdated

for (i, path) in paths.enumerate() {
match move_fs_item(&path, tempdir.path().join(i.to_string())) {
match wrapfs::rename(&path, tempdir.path().join(i.to_string())) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change back (also all other occurences in this file).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted - they are doing the same thing though so we have duplicated logic in wrapfs::rename and move_fs_item as it is now.

I need time to think and self-review before further review - but also wanted to capture all suggestions etc here so they don't go outdated.

Comment thread core/src/purl.rs Outdated
Comment thread core/src/env/iri_normalize.rs Outdated
Comment thread core/src/env/index.rs
Comment on lines +20 to +24
//! - A 404 on any required document is a hard error. At the
//! resolver-facing `versions_async` boundary the 404 is converted to
//! an empty stream (so a misconfigured mirror does not block other
//! sources), but every other caller — including `get_project_async`
//! and `index.json` — propagates it.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error out also in versions_async is case versions.json is not found.

Copy link
Copy Markdown
Collaborator Author

@consideRatio consideRatio Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a messy situation, unresolved for now.

I think the crux is that 404 on versions.json makes sense if you know the index to have the project, but the current implementation doesn't use index.json when resolving - it just probes versions.json directly.

I think we should allow 404 on versions.json during resolving on an index, to be allowed to mean "nothing here" rather than error due to this. We support multiple indexes to be listed when reading right?

wdyt @andrius-puksta-sensmetry?

Comment thread core/src/env/index.rs Outdated
Comment thread core/src/env/index.rs Outdated
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Comment thread core/src/env/index.rs Outdated
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
@consideRatio
Copy link
Copy Markdown
Collaborator Author

Thank you @andrius-puksta-sensmetry and @Jonas-Puksta-Sensmetry for all your attention to this PR!!!

Comment thread core/src/purl.rs
/// have exactly two slash-separated segments (`<publisher>/<name>`) after
/// this prefix, both passing [`is_normalized_field`] for their respective
/// [`FieldKind`].
pub const PKG_SYSAND_PREFIX: &str = "pkg:sysand/";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to use such a constant everywhere instead of hard-coding pkg:sysand/

Comment thread core/src/purl.rs Outdated
Comment thread core/src/purl.rs Outdated
Comment thread core/src/purl_tests.rs

#[test]
fn publisher_field_validation() {
assert!(is_valid_publisher("Acme Labs"));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can also start with a number

Co-authored-by: Jonas Pukšta <146448971+Jonas-Puksta-Sensmetry@users.noreply.github.com>
Signed-off-by: Erik Sundell <erik.i.sundell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change client's index reading behavior (breaking)

3 participants