-
Notifications
You must be signed in to change notification settings - Fork 3k
[SPEC] Add finer grained read restrictions as part of loadTable #13879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
bf843df
57dcda2
5293713
26c83fe
30ff72f
47310cb
981c218
18083d2
e25ab5e
c3fa99b
f8d82f9
f9f6a5b
650b856
cd769a0
008693b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -3347,6 +3347,105 @@ components: | |||||||||||||||||||||||
| additionalProperties: | ||||||||||||||||||||||||
| type: string | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ReadRestrictions: | ||||||||||||||||||||||||
| type: object | ||||||||||||||||||||||||
| description: > | ||||||||||||||||||||||||
| Read restrictions for a table, including column projections and row filter expressions. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| A client MUST enforce the restrictions defined in this object when reading data | ||||||||||||||||||||||||
| from the table. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| These restrictions apply only to the authenticated principal, user, or account | ||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It could be more than that (for example, could be based on the environment as well). Should we just summarize it as the authorization context? I guess it also implies that caching may be impacted as well:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
we haven't defined Authorization yet in the the IRC, as this is entirely managed by catalog (for example grants / policies etc), i do agree these are like authorization predicates but wouldn't saying this depends of the authenticated prinicipal suff, do you have any specific case in mind ?
My understanding was Etag should be same as that of the what we do in the case of storage cred's ? if the callers has a different authenticated prinicipal, the catalog should send the creds accordingly ? let me see what we say from the ETag POV.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I meant authorization in a broader http sense, not necessarily in terms of catalog primitives (like grants/policies). What the authorization itself represents is open for interpretation. User/principals are the obvious ones, but the environment (for example the engine trustworthiness, or if the client is from within vs outside) could be part of it. That said, it should be opaque for the client. Re ETag, I was not able to find any strong documentation regarding its support in Iceberg. But ETag is a HTTP concept (not an iceberg one) and the semantic is about the whole response, not a part of it. If they were some kind of intermediaries which would basically handle etag and preconditions, this may cause those intermediaries to return the same response to the wrong audience, causing security issues
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
My understanding is trusted engine more about authenticating both user and the engine (that catalog trusts), but still authorizing on the user grants, do we wanna be explicit about trust or being implicit is fine ?
I understand, I meant iceberg ETAG handling, my understanding is we should validate noting changed post doing authorization checks ? authorization check defines what kind of creds one gets and so will be the same for these read restriction, infact we do similar handling in Polaris for this
I am not sure if we can do ETAG checks in-general on a protected resource without authorization checks in place ?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, that's basically the core of it :)
I'm not sure the Polaris implementation is 100% correct. The issue is not in the authorization part, but the fact that the etag would only include the table metadata representation but none of the other data returned in the response, so a client may have cached the temporary credentials (or possibly also the read restrictions) and because it got a Those extra information could also be included in the etag computation (it may become to be too expensive though?), and assuming also that the client keeps a cache per authorization header (+ whatever is returned by the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
My authorization requirement before returning 304 was in the context above, I do agree that post AuthZ we need to validate against the whole response instead of partial response. But what i wanted to suggest is we should be doing the same for read-restrictions what we are doing for credentials, I think we both agree with this right ? My understanding is if its a multitenant system isn't it the responsibility of the system to be multi-tenant aware while caching stuff ? Are we saying the client by default assumes that every cacheable and is now relying on the server to send an apt That being said if we say ETAG is an HTTP concept and a multitenant system should make user specific cache, do we need to elaborate more ? |
||||||||||||||||||||||||
| associated with the request. They MUST NOT be interpreted as global policy and | ||||||||||||||||||||||||
| MUST NOT be applied beyond the entity identified by the Authentication header | ||||||||||||||||||||||||
| (or other applicable authentication mechanism). | ||||||||||||||||||||||||
| properties: | ||||||||||||||||||||||||
| required-column-projections: | ||||||||||||||||||||||||
| description: > | ||||||||||||||||||||||||
| A list of projections that MUST be applied prior to any query-specified | ||||||||||||||||||||||||
singhpk234 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||
| projections. | ||||||||||||||||||||||||
| If this property is absent, no mandatory projection applies, | ||||||||||||||||||||||||
| and a reader MAY project any subset of columns of the table, including all columns. | ||||||||||||||||||||||||
singhpk234 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| 1. A reader MUST project only columns listed in the required-column-projections. | ||||||||||||||||||||||||
| - If a listed column has a transform, the reader MUST apply it and replace | ||||||||||||||||||||||||
| all references to the underlying column with the transformed value | ||||||||||||||||||||||||
| (for example, truncate[4](cc) MUST be projected as truncate[4](cc) AS cc, | ||||||||||||||||||||||||
| and all references to cc during query evaluation post applying required-row-filter MUST resolve to this alias). | ||||||||||||||||||||||||
| - Columns not listed in the required-column-projections MUST NOT be read. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| 2. A column MUST appear at most once in the required-column-projections. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| 3. If a projected column's corresponding entry includes an action that the reader cannot evaluate, | ||||||||||||||||||||||||
| the reader MUST fail rather than ignore the transform. | ||||||||||||||||||||||||
singhpk234 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| 4. An identity transform is equivalent to projecting the column directly. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| 5. The data type of the projected column MUST match the data type defined for the transform result. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| type: array | ||||||||||||||||||||||||
| items: | ||||||||||||||||||||||||
| $ref: '#/components/schemas/Projection' | ||||||||||||||||||||||||
| required-row-filter: | ||||||||||||||||||||||||
| description: > | ||||||||||||||||||||||||
| An expression that filters rows in the table that the authenticated principal does not have access to. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| 1. A reader MUST discard any row for which the filter evaluates to false or null, and | ||||||||||||||||||||||||
| no information derived from discarded rows MAY be included in the query result. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| 2. Row filters MUST be evaluated against the original, untransformed column values. | ||||||||||||||||||||||||
| Required projections MUST be applied only after row filters are applied. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| 3. If a client cannot interpret or evaluate a provided filter expression, it MUST fail. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| 4. If this property is absent, null, or always true then no mandatory filtering is required. | ||||||||||||||||||||||||
| $ref: '#/components/schemas/Expression' | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| Projection: | ||||||||||||||||||||||||
| type: object | ||||||||||||||||||||||||
| description: > | ||||||||||||||||||||||||
| Defines a projection for a column. | ||||||||||||||||||||||||
| If action is not specified, the column is projected as-is. | ||||||||||||||||||||||||
| properties: | ||||||||||||||||||||||||
| field-id: | ||||||||||||||||||||||||
| type: integer | ||||||||||||||||||||||||
| description: field id of the column being projected. | ||||||||||||||||||||||||
| action: | ||||||||||||||||||||||||
| $ref: '#/components/schemas/Action' | ||||||||||||||||||||||||
| required: | ||||||||||||||||||||||||
| - field-id | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| Action: | ||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm confused by the introduction of special references and the removal of
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the action like sha256 / null are very well defined like one needs to do project null for this column ... one option would have been we create transforms for this and then wrap a scalar expression referecing the transform which return this value, it seemed like an overkill for this function rather what we did now is one has an action if its a known mask just execute it, same is for transform, I wonder if for UDFs we just need apply_udf action ?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Action is too generic in this context. maybe name it like also action could be optional, right? If only projection is needed (without any masking)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Its a bit intentional to name it Action, idea to use them later in expression too, presently Action would suggest on needs to do this, ApplyTransform can be action to apply existing / predefined transforms in iceberg.
Agree, we can project it as it, no need to wrap it around identity transform, i removed this and added a note on what does a projection without action means. |
||||||||||||||||||||||||
| description: Defines the specific action to be executed for computing the projection. | ||||||||||||||||||||||||
| oneOf: | ||||||||||||||||||||||||
singhpk234 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||
| - $ref: '#/components/schemas/MaskHashSha256' | ||||||||||||||||||||||||
| - $ref: '#/components/schemas/ReplaceWithNull' | ||||||||||||||||||||||||
| - $ref: '#/components/schemas/MaskAlphanumeric' | ||||||||||||||||||||||||
| - $ref: '#/components/schemas/ApplyTransform' | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| MaskHashSha256: | ||||||||||||||||||||||||
| description: | | ||||||||||||||||||||||||
| Mask the data of the column by applying SHA-256. | ||||||||||||||||||||||||
| The input must be UTF-8 encoded bytes of the column value. | ||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wondering if we need to say
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
it would be applicable i think for the binary type but not for the numeric type like int | long since the requirement is the input type should be the same as output type. |
||||||||||||||||||||||||
| The SHA-256 digest is represented as a lowercase hexadecimal string. | ||||||||||||||||||||||||
| Engines must follow this procedure to ensure consistency: | ||||||||||||||||||||||||
| 1. Convert the column value to a UTF-8 byte array. | ||||||||||||||||||||||||
singhpk234 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||
| 2. Apply the SHA-256 algorithm as specified in NIST FIPS 180-4. | ||||||||||||||||||||||||
| 3. Convert the resulting 32-byte digest to a 64-character lowercase hexadecimal string. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ReplaceWithNull: | ||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good call out, i am not sure why we don't specify the void transform in rest spec, let me dig some historical context if its just a miss, will incorporate this accordingly iceberg/open-api/rest-catalog-open-api.yaml Lines 2383 to 2393 in a3c538f
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added : #14778 |
||||||||||||||||||||||||
| description: Masks data by replacing it with a NULL value. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| MaskAlphanumeric: | ||||||||||||||||||||||||
| description: mask all alphabetic characters with 'x' and numeric characters with 'n' | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ApplyTransform: | ||||||||||||||||||||||||
| type: object | ||||||||||||||||||||||||
| description: Replace the field with the result of a transform expression. Produce the original field name with the transformed values. | ||||||||||||||||||||||||
| properties: | ||||||||||||||||||||||||
| term: | ||||||||||||||||||||||||
| $ref: '#/components/schemas/Term' | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| LoadCredentialsResponse: | ||||||||||||||||||||||||
| type: object | ||||||||||||||||||||||||
| required: | ||||||||||||||||||||||||
|
|
@@ -3407,6 +3506,8 @@ components: | |||||||||||||||||||||||
| type: array | ||||||||||||||||||||||||
| items: | ||||||||||||||||||||||||
| $ref: '#/components/schemas/StorageCredential' | ||||||||||||||||||||||||
| read-restrictions: | ||||||||||||||||||||||||
| $ref: '#/components/schemas/ReadRestrictions' | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ScanTasks: | ||||||||||||||||||||||||
| type: object | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we state that this is an expectation for "trusted" clients?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is we don't define what
trustmeans like untampared / sandboxed compute etc hence we might need to define that if we were to reference it here, though thatMUSTwould imply the expectation from the client end, please let me know your thoughts considering above, if we were to define trust there are additonal things i think it would be helpful define like predicate reorder attacks :if
1/iff(color = 'Purple', 0, 1) = 1to be executed before filtering the red widgets this would make the user aware that there exists colour = 'Purple'