Skip to content

List objects and common prefixes#45

Merged
carsonfarmer merged 7 commits intodevelopfrom
carson/prefixes
Mar 31, 2024
Merged

List objects and common prefixes#45
carsonfarmer merged 7 commits intodevelopfrom
carson/prefixes

Conversation

@carsonfarmer
Copy link
Member

@carsonfarmer carsonfarmer commented Mar 25, 2024

Adds support for S3 style ObjectList API. The list method on the actor now takes multiple arguments, including prefix, delimiter, and limit. It Requires that these are BytesKeys, which are a type to be used to serialize as byte string instead of a u8 array. This allows us to assume byte string semantics for string comparisons etc.

Once merged, this provides the following additions to the /os proxy API. Listing is mostly as before, though the actual objects are now nested within the objects key (which is an array):

curl -s http://localhost:8001/v1/os | jq ".objects[].key"
"one|two.pdf"
"one"
"blah|bitcoin.pdf"
"one.pdf"

When delimiter is used, any common prefixes are aggregated into the common_prefixes key and included in the response:

curl -s http://localhost:8001/v1/os\?delimiter\=\| | jq             
{
  "common_prefixes": [
    "blah|",
    "one|"
  ],
  "objects": [
    {
      "key": "one",
      "resolved": false,
      "value": "bafk2bzacecslkk4nsiwdf5sx3i3xi6b6yj73uvli3p2u3bmczisougobh5bqs"
    },
    {
      "key": "one.pdf",
      "resolved": false,
      "value": "bafk2bzaceammiknpp4joitwsvtcc3u3ni47bvvbbaqgjed5xieraknhdxb6pc"
    }
  ]
}

One downside of the proxy API that we are using for the object store (at the moment), is that it isn't possible to push files with forward slashes as the delimiter... hense the | in the examples above.

Fixes #13

@carsonfarmer carsonfarmer added the enhancement New feature or request label Mar 25, 2024
@carsonfarmer carsonfarmer added this to the Usable SDK/CLI milestone Mar 25, 2024
@carsonfarmer carsonfarmer self-assigned this Mar 25, 2024
@carsonfarmer carsonfarmer marked this pull request as ready for review March 26, 2024 19:18
@sanderpick
Copy link
Contributor

the failed test is due to some lint issues... you can do make lint locally before pushing

Signed-off-by: Carson Farmer <carson.farmer@gmail.com>
Signed-off-by: Carson Farmer <carson.farmer@gmail.com>
Signed-off-by: Carson Farmer <carson.farmer@gmail.com>
Signed-off-by: Carson Farmer <carson.farmer@gmail.com>
Signed-off-by: Carson Farmer <carson.farmer@gmail.com>
Signed-off-by: Carson Farmer <carson.farmer@gmail.com>
@carsonfarmer carsonfarmer changed the base branch from develop to main March 30, 2024 05:20
@carsonfarmer carsonfarmer changed the base branch from main to develop March 30, 2024 05:20
@@ -0,0 +1,384 @@
// Copyright 2024 Textile
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems silly, but I actually pulled this temporarily into a separate project and testing things out there. Then I found it was quite nice to have the state all self-contained like this, so I kept it.

}

#[derive(Serialize, Deserialize)]
struct ListQuery {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is optional on the web.

hq: HeightQuery,
) -> Result<impl Reply, Rejection> {
let res = os_list(client, args, hq.height.unwrap_or(0))
let opts = ListOptions {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted to use the same ListOptions struct all the way up the way down to the actor state. It just keeps things simple. However, it would have been much nicer to use something more like the ListQuery structure, which contains Options. I'm not clear on whether we can use these with IPLD encoding/decoding, so I opted not to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ipld serializers do indeed handle Option, but this looks fine!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, very nice. I should have just checked. I might make the switch to option at some point in the future then.

}
}

impl State {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the existing methods are unchanged.

}

#[derive(Default, Debug, Serialize_tuple, Deserialize_tuple)]
pub struct ObjectList {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new struct, which is quite similar to the AWS ListObjects response object.

Ok(value)
}

pub fn list<BS: Blockstore>(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the core change on the state.

let hamt = Hamt::<_, Object>::load_with_bit_width(&self.root, store, BIT_WIDTH)?;
let mut objects = Vec::new();
let mut common_prefixes = std::collections::BTreeSet::<Vec<u8>>::new();
let limit = if options.limit == 0 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the limit is zero, we make it the max, but anything else we "crop" to the max without explanation.

(options.limit as usize).min(MAX_LIST_LIMIT)
};
let offset = options.offset;
let mut count = 0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda lame, but since our hamt doesn't provide a way to directly offset into the key space, we need to count manually.

for pair in &hamt {
let (k, v) = pair?;
let key = k.0.clone();
if !options.prefix.is_empty() && !key.starts_with(&options.prefix) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip this one if we have a prefix and it doesn't match. note that here, we're doing direct byte comparisons (faster), whereas below, when doing the delimiter matching, we need to convert to utf8 strings...

Signed-off-by: Carson Farmer <carson.farmer@gmail.com>
Copy link
Member Author

@carsonfarmer carsonfarmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added support for nesting prefixes/delimiters.

let utf8_key = String::from_utf8(key.clone())?;
let utf8_delimiter = String::from_utf8(options.delimiter.clone())?;
if let Some(index) = utf8_key[prefix_length..].find(&utf8_delimiter) {
let subset = utf8_key[..=(index + prefix_length)].as_bytes().to_owned();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When listing common prefixes, we include the delimiter itself, and the full path up to the delimiter.

let prefix_length = utf8_prefix.len();
let utf8_key = String::from_utf8(key.clone())?;
let utf8_delimiter = String::from_utf8(options.delimiter.clone())?;
if let Some(index) = utf8_key[prefix_length..].find(&utf8_delimiter) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only search for the delimiter after the prefix. This allows us to essentially do nested prefixes and a folder-like hierarchy. But, we always include the full path up to the first delimiter after the prefix. I'm sure there is a cleaner way to do this.

@carsonfarmer
Copy link
Member Author

the failed test is due to some lint issues... you can do make lint locally before pushing

Actually, I ended up just changing my "check" command in VSCode from "check" to "clippy". So this is all automatically checked now. But a nice side effect of this is that clippy appears to be way faster than the default "check"?

Copy link
Contributor

@sanderpick sanderpick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work!

hq: HeightQuery,
) -> Result<impl Reply, Rejection> {
let res = os_list(client, args, hq.height.unwrap_or(0))
let opts = ListOptions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ipld serializers do indeed handle Option, but this looks fine!

@sanderpick
Copy link
Contributor

sanderpick commented Mar 30, 2024

in the PR description, in the delimiter example, if I understand common_prefixes properly, the actual keys present are:

  • blah|one|one
  • blah|one|one.pdf

is that right?

@carsonfarmer
Copy link
Member Author

carsonfarmer commented Mar 30, 2024

Nope, the actual keys are those listed in the prior example. It lists all keys that match a given prefix up to and including the delimiter. Note in the second example with delimiter there is no prefix, so it lists anything up to the first delimiter as a common prefix.

Think of it like listing all files in the folder at the given prefix on down, but if a delimiter is supplied, then stop there, essentially listing sub folders but no lower.

@sanderpick
Copy link
Contributor

ah! slick, makes sense 🙏

@sanderpick
Copy link
Contributor

mind if I merge this?

@carsonfarmer carsonfarmer merged commit fc1e926 into develop Mar 31, 2024
@carsonfarmer carsonfarmer deleted the carson/prefixes branch March 31, 2024 02:43
sanderpick pushed a commit that referenced this pull request May 30, 2025
depends on [entangler
#45](recallnet/entanglement#45), after merging
I'll update version deps to match the new version on engangler/main.

targets #625, this updates iroh to version 0.35, which will match the
version of the deployed relays.

This PR also incorporates the streaming updates & config constant change
that drops the `ENTANGLER_P` param, which should be reviewed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add listing keys by prefix, delimiter, etc to objectstore actor

2 participants