Skip to content

[libstd]: reworked CacheHash means its WASI compatible now#5784

Closed
kubkon wants to merge 14 commits intoziglang:masterfrom
kubkon:cache-hash
Closed

[libstd]: reworked CacheHash means its WASI compatible now#5784
kubkon wants to merge 14 commits intoziglang:masterfrom
kubkon:cache-hash

Conversation

@kubkon
Copy link
Member

@kubkon kubkon commented Jul 3, 2020

This PR builds upon #5701.

This PR reworks CacheHash to make it WASI compatible. The summary of changes:

  • On non-capability-oriented hosts (i.e., excluding WASI), use Dir.realpathAlloc to resolve cached path. Previously, we relied on std.fs.path.resolve to do the path resolution which could yield erroneous results since std.fs.path.resolve would not actually issue any syscalls, and hence, would not correctly resolve any symlinks. Using Dir.realpathAlloc which now supports WASI (more on this below), means CacheHash now supports WASI.

  • Add Dir.realpathWasi, and os.realpathatWasi. The latter resolves the path however upto only the Dir it is relative
    to ensuring the sandboxing rules are observed. In fact, it proceeds in two steps:

    1. Try opening the path as-is, and check if that works OK. Then, this will imply the path was valid under WASI's capability-oriented security model, and no path traversal attack was attempted.
    2. Since we've verified we're OK wrt sandboxing rules, we can analyze the path component-by-component working out the canonicalized path in the process.

This PR closes #5437.

EDIT: This PR also fixes os.realpathatW to correctly handle relative pathname with relative components such as . or ...

@kubkon
Copy link
Member Author

kubkon commented Jul 7, 2020

Pinging @andrewrk since you're the issue author. It'd be awesome if you could have a look at #5701 and this one, and lemme know what you think about this approach.

kubkon and others added 14 commits July 8, 2020 17:06
`std.os.realpathat` is similar to `std.os.realpath`, however, it accepts
a pair `fd_t, []const u8` of args and thus works out the realpath
of a relative path wrt to some opened file descriptor.

If the input pathname argument turns out to be an absolute path, this
function reverts to calling `realpath` on that pathname completely
ignoring the input file descriptor. This behaviour is standard in
POSIX and IMHO a good rule of thumb to follow.

If the input file descriptor was obtained using `std.fs.cwd()` call,
this function reverts to `std.os.getcwd()` to obtain the file
descriptor's path.

`std.fs.Dir.realpath` integrates `std.os.realpathat` with `std.fs.Dir`
but with dynamic memory allocator.
This commit adds null-terminated and WTF16 versions of
`std.fs.Dir.realpath` and `std.os.realpathat`. Alloc version has
been renamed to `std.fs.Dir.realpathAlloc` to be compatible with
the naming convention used across Zig's `libstd`.
Co-authored-by: Joachim Schmidt <joachim.schmidt557@outlook.com>
On non-capability-oriented hosts (i.e., excluding WASI), use
`Dir.realpathAlloc` to resolve cached path. Previously, we relied
on `std.fs.path.resolve` to do the path resolution which could yield
erroneous results since `std.fs.path.resolve` would not actually
issue any syscalls, and hence, would not correctly resolve any symlinks.
This commit adds `Dir.realpathWasi`, and `os.realpathatWasi`. The
latter resolves the path however upto only the `Dir` it is relative
to ensuring the sandboxing rules are observed. In fact, it proceeds
in two steps:

1. Try opening the path as-is, and check if that works OK. Then,
 this will imply the path was valid under WASI's capability-oriented
 security model, and no path traversal attack was attempted.
2. Since we've verified we're OK wrt sandboxing rules, we can
 analyze the path component-by-component working out the canonicalized path
 in the process.
This commit fixes `std.os.realpathatW` to properly canonicalize
`pathname` if contains '.' or '..'.
@kubkon
Copy link
Member Author

kubkon commented Jul 11, 2020

After discussions with @andrewrk and further thought, due to the fact that querying the OS for a path to a resource is generally not a universal feature (for instance, it turns out FreeBSD for one doesn't support any standard mechanisms for getting those, at least to my knowledge: source), this approach to CacheHash is very limiting and not recommended (in fact, now I see why the relative paths are "canonicalized" by hand, possibly erroneously since we don't handle symlinks, hardlinks, etc., in CacheHash using std.fs.path.resolve). Therefore, I'm closing this PR as-is, in the hope of reworking it by preserving the use of std.fs.path.resolve or similar.

@kubkon kubkon closed this Jul 11, 2020
@kubkon kubkon deleted the cache-hash branch December 9, 2020 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

some WASI tests are disabled due to using cwd

1 participant