breaking: reading based on new index structure#288
breaking: reading based on new index structure#288consideRatio wants to merge 16 commits intosensmetry:mainfrom
Conversation
4f04d45 to
4b14059
Compare
ef6f5e0 to
3d43a8f
Compare
de29a9c to
6ee8561
Compare
The client now reads indexes through the spec documented in
`docs/src/index-protocol.md` and `docs/src/index-api-protocol.md`.
Wire-format changes (reads):
- `<index_root>/.well-known/sysand-index.json` is fetched first to
resolve `index_root` / `api_root`. 404 is not an error (both roots
default to the discovery root); other non-2xx is a hard error.
- `<index_root>/index.json` narrows to `{"projects": [{"iri": …}]}`.
- `<…>/versions.json` replaces the previous `versions.txt` newline
list. Each entry carries `version`, `usage`, `project_digest`,
`kpar_size`, `kpar_digest`.
- Per-version layout becomes `<…>/<version>/.project.json`,
`.meta.json`, `project.kpar` (replacing the old
`<…>/<version>.kpar/…` layout).
- IRI → path: `pkg:sysand/<pub>/<name>` → `<pub>/<name>/`; all other
IRIs → `_iri/<sha256(normalized_iri)>/`.
- Non-canonical `pkg:sysand/…` IRIs are now hard-rejected with
`SysandPurlError::NotNormalized`; `InterchangeProjectUsageRaw::validate`
runs the same rejection, so every `.project.json` consumer is
affected (not just the index client).
- `project_digest` is verified before `.project.json` / `.meta.json`
are exposed; `kpar_digest` is verified during streamed download and
never renamed into place on mismatch.
Wire-format changes (publish):
- `sysand publish` now performs mandatory well-known discovery via the
configured `HTTPAuthentication` policy and posts to
`<api_root>/v1/upload` (leading slash dropped from the old
`/api/v1/upload`). Auth-gated well-known URIs are supported — RFC
8615 takes no position on authentication.
On-disk format changes:
- `.meta.json`'s `created` field now serializes via
`to_rfc3339_opts(SecondsFormat::Nanos, true)`, giving nanosecond
precision and a `Z` suffix. Docs/protocol update tracked as a
follow-up.
- Lockfiles may now carry `remote_kpar_digest` on
`Source::RemoteKpar`. This is needed to preserve the raw-archive
integrity tripwire across `lock` → `sync`: once `lock` has read
`versions.json`, `sync` should be able to verify the downloaded
`project.kpar` bytes against the lockfile alone, without re-querying
the index. Relying only on the canonical `project_digest` would miss
repacks/tampering that leave the extracted project content unchanged.
Public Rust API:
- New modules: `env::discovery`, `env::index`, `project::index_entry`,
`purl`. Removed: `env::reqwest_http::HTTPEnvironmentAsync`.
- `CombinedResolver` / `CombinedProjectStorage` rename `Registry` →
`Index` across types and variants.
- `PublishError`: `InvalidIndexUrl` → split into
`InvalidDiscoveryRoot` / `InvalidApiRoot`; `validate_endpoint_url_shape`
takes a new `EndpointKind` argument.
- `commands::publish::do_publish` signature changed;
`prepare_publish_payload`, `validate_endpoint_url_shape`,
`PublishPreparation`, `EndpointKind` are newly `pub`.
- `ProjectRead`/`ProjectReadAsync` contract: wrappers must forward
`checksum_canonical_hex{,_async}` and related methods explicitly.
The `ProjectRead` derive macro auto-forwards the expanded set; the
in-tree `AsAsyncProject` wrapper forwards all six affected methods,
and the derive now qualifies `CanonicalizationError` explicitly so
downstream crates do not need a caller-side import for the generated
code to compile.
- `ReqwestKparDownloadedProject` gains `new`, `is_downloaded`,
`ensure_downloaded_verified`, and `with_expected_sha256_hex`; new
`DigestMismatch` error variant. The type now tracks
verified-vs-unverified download state so that an earlier unverified
call cannot silently short-circuit a later verified call, and
lockfile-driven `sync` can enforce the recorded `kpar_digest`.
- `InterchangeProjectValidationError::MalformedSysandPurl` is new;
`CanonicalizationError::map_project_read` is new.
- `core/Cargo.toml` adds `idna`, enables `tokio/sync`.
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
6ee8561 to
de55a9e
Compare
|
Impressive... |
This comment was marked as resolved.
This comment was marked as resolved.
andrius-puksta-sensmetry
left a comment
There was a problem hiding this comment.
Just a start...
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell@sensmetry.com>
5aca1a1 to
c1470a7
Compare
Why have EDIT by Erik: discussed during the meet, we will include it, but it is beyond what we need given the current functionality etc. RemoteKpar included it, and that is the key reason it was considered initially. For the new IndexKpar lockfile entry, we could remove it and stop providing it in the versions.json - we decided to let it stay around though. EDIT by Erik: IndexKpar now used with size and digest, RemoteKpar is left untouched. |
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
andrius-puksta-sensmetry
left a comment
There was a problem hiding this comment.
Part 2
| .map_err(|source| PublishError::InvalidApiRoot { | ||
| url: api_root.as_str().into(), | ||
| reason: format!("failed to compose upload URL: {source}"), | ||
| }) |
There was a problem hiding this comment.
| }) | |
| let mut with_slash = crate::env::discovery::with_trailing_slash(api_root.clone()); | |
| with_slash.path_segments_mut().unwrap().push(UPLOAD_ENDPOINT_PATH); | |
| with_slash |
unwrap() is used because error is not possible for http(s) URLs due to their structure, and api_root is required to be http(s).
There was a problem hiding this comment.
I'll go with this:
pub fn build_upload_url(api_root: &Url) -> Result<Url, PublishError> {
// The `v1/upload` suffix rejection is part of shape validation.
validate_endpoint_url_shape(api_root, EndpointKind::ApiRoot)?;
Ok(crate::env::discovery::with_trailing_slash(api_root.clone())
.join(UPLOAD_ENDPOINT_PATH)
.unwrap())
}UPLOAD_ENDPOINT_PATH includes multiple segments, but use of push() assumes its one, so the v1/upload will be urlencoded and not added as two separate path segments.
|
|
||
| for (i, path) in paths.enumerate() { | ||
| match move_fs_item(&path, tempdir.path().join(i.to_string())) { | ||
| match wrapfs::rename(&path, tempdir.path().join(i.to_string())) { |
There was a problem hiding this comment.
Change back (also all other occurences in this file).
There was a problem hiding this comment.
Reverted - they are doing the same thing though so we have duplicated logic in wrapfs::rename and move_fs_item as it is now.
I need time to think and self-review before further review - but also wanted to capture all suggestions etc here so they don't go outdated.
| //! - A 404 on any required document is a hard error. At the | ||
| //! resolver-facing `versions_async` boundary the 404 is converted to | ||
| //! an empty stream (so a misconfigured mirror does not block other | ||
| //! sources), but every other caller — including `get_project_async` | ||
| //! and `index.json` — propagates it. |
There was a problem hiding this comment.
Error out also in versions_async is case versions.json is not found.
There was a problem hiding this comment.
This was a messy situation, unresolved for now.
I think the crux is that 404 on versions.json makes sense if you know the index to have the project, but the current implementation doesn't use index.json when resolving - it just probes versions.json directly.
I think we should allow 404 on versions.json during resolving on an index, to be allowed to mean "nothing here" rather than error due to this. We support multiple indexes to be listed when reading right?
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
Signed-off-by: Erik Sundell <erik.sundell+2025@sensmetry.com>
|
Thank you @andrius-puksta-sensmetry and @Jonas-Puksta-Sensmetry for all your attention to this PR!!! |
| /// have exactly two slash-separated segments (`<publisher>/<name>`) after | ||
| /// this prefix, both passing [`is_normalized_field`] for their respective | ||
| /// [`FieldKind`]. | ||
| pub const PKG_SYSAND_PREFIX: &str = "pkg:sysand/"; |
There was a problem hiding this comment.
I would like to use such a constant everywhere instead of hard-coding pkg:sysand/
|
|
||
| #[test] | ||
| fn publisher_field_validation() { | ||
| assert!(is_valid_publisher("Acme Labs")); |
There was a problem hiding this comment.
It can also start with a number
Co-authored-by: Jonas Pukšta <146448971+Jonas-Puksta-Sensmetry@users.noreply.github.com> Signed-off-by: Erik Sundell <erik.i.sundell@gmail.com>
A few prompts I've used to self-review, besides scanning it all myself