-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[SPEC] Add finer grained read restrictions as part of loadTable #13879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[SPEC] Add finer grained read restrictions as part of loadTable #13879
Conversation
1234341 to
6422478
Compare
9c1b305 to
6c5b955
Compare
|
Re: Schema evolution cases |
Update - actively trying a POC for this in my fork (PR : singhpk234#270 if folks are interested to give some early feedback) |
open-api/rest-catalog-open-api.yaml
Outdated
|
|
||
| MaskHashSha256: | ||
| type: object | ||
| description: Mask the data of the column by apply SHA256 hash algorithm. Engines are free to use their own implementation of SHA256. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either this should be any cryptographic hash algorithm, or this should specify how to calculate values so they are consistent across engines. I don't really care which one, but if we are specifying details then we should specify all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rephrased it with the spec definition of sha256 please let me know if this reads better ?
| 2. Apply the SHA-256 algorithm as specified in NIST FIPS 180-4. | ||
| 3. Convert the resulting 32-byte digest to a 64-character lowercase hexadecimal string. | ||
| ReplaceWithNull: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the void transform...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call out, i am not sure why we don't specify the void transform in rest spec, let me dig some historical context if its just a miss, will incorporate this accordingly
iceberg/open-api/rest-catalog-open-api.yaml
Lines 2383 to 2393 in a3c538f
| Transform: | |
| type: string | |
| example: | |
| - "identity" | |
| - "year" | |
| - "month" | |
| - "day" | |
| - "hour" | |
| - "bucket[256]" | |
| - "truncate[16]" | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added : #14778
open-api/rest-catalog-open-api.yaml
Outdated
| MaskHashSha256: | ||
| description: | | ||
| Mask the data of the column by applying SHA-256. | ||
| The input must be UTF-8 encoded bytes of the column value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the type is binary (not all binary strings can produce a valid UTF-8 string)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do have the following restriction additionally "The data type of the projected column MUST match the data type defined for the transform result" is it still applicable.
52b08b2 to
8a96996
Compare
| - field-id | ||
| - action | ||
|
|
||
| Action: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Action is too generic in this context. maybe name it like Masking?
also action could be optional, right? If only projection is needed (without any masking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Action is too generic in this context. maybe name it like Masking?
Its a bit intentional to name it Action, idea to use them later in expression too, presently Action would suggest on needs to do this, ApplyTransform can be action to apply existing / predefined transforms in iceberg.
also action could be optional, right?
Agree, we can project it as it, no need to wrap it around identity transform, i removed this and added a note on what does a projection without action means.
| MaskHashSha256: | ||
| description: | | ||
| Mask the data of the column by applying SHA-256. | ||
| The input must be UTF-8 encoded bytes of the column value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if we need to say UTF-8 encoded bytes. is it applicable to binary or number types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it applicable to binary or number types?
it would be applicable i think for the binary type but not for the numeric type like int | long since the requirement is the input type should be the same as output type.
propose a new structure`
4cbf604 to
38e45da
Compare
38e45da to
008693b
Compare
About the proposal
This aims at returning, policy evaluation result (Access decisions) for fine grained access policies based on the calling user as part of the loadTable response.
This defines a new object called
ReadRestrictionswhich an optional field catalog would use to convey the access decision, The expectation is these rules (projections and the row filters) are correctly applied and enforced by the client, which brings an implicit requirement to have a trusted partner, establishing trust between callers engine and catalog is not scope of this proposal as its totally up to the catalog on how its established via OAuth Delegation Flow or mTLS.The
ReadRestrictionsreturns back projections which are modeled as Term and the row filters are modeled as ExpressionThis is based on my current understanding of community concencus.
Details of the community syncs can be reference here - https://www.youtube.com/watch?v=RRyohCUDnME
Future Extension
To support complex stuff like mapping tables (joins) or dialect specific SQL or complex policies, the Expressions and Terms can reference Iceberg UDFs (https://lists.apache.org/thread/rvy00kvgj1ybtond1v46t3bkv06v0jd0), which is currently being discussed in the community, once iceberg UDFs are defined we can enhance column projections and row filters to reference UDFs to handle these scenarios.
Additional consideration
clients expression handling capability : lets say we introduce a new expression or a transform in the IRC spec, and the catalog returns this new expression, but the client is running on a lower version so it doesn't understand this newly added expression, IMHO its fine if the client fails in interpreting / parsing the expression as it doesn't understands it, open to broader feedback, (similar discussion in past for spark community (here))
Schema evolution cases, consider a column got renamed, essentially the DDM and RAP should be Binded, having the field-id of the column they are referring to additionally sent to the client to correctly apply the read restrictions consder one had a filter on column A (when the policy was attached) which got renamed to column B later when reading the latest version of the schema reader could essentially apply filter on B essentially adhering to the column projection rule - https://iceberg.apache.org/spec/#column-projection
Acknowledgement
Many thanks to the entire Iceberg community for the extensive discussions and invaluable feedback over the years, which have been instrumental in shaping this proposal into its current form.
Issues
dev lists
[1] https://lists.apache.org/thread/4swop72zgcr8rrmwvb51rlk0vnb8joyz
[2] https://lists.apache.org/thread/8t2zh9nchklm4zwjj89vnq9fg9wv45o4
docs
[1] https://docs.google.com/document/d/14nmuxxfzQsYo59o0Fbpb-pxOlzS6bVtduL8P8pwKZ6U/edit?tab=t.0#bookmark=id.eekzk8xl6uo
[2] https://docs.google.com/document/d/108Y0E8XsZi91x-UY0_aHLEbmXDNmxmS5BnDjunEKvTM/edit?tab=t.0
syncs
[1] https://www.youtube.com/watch?v=RRyohCUDnME