Skip to content

Conversation

@singhpk234
Copy link
Contributor

@singhpk234 singhpk234 commented Aug 20, 2025

About the proposal

This aims at returning, policy evaluation result (Access decisions) for fine grained access policies based on the calling user as part of the loadTable response.

This defines a new object called ReadRestrictions which an optional field catalog would use to convey the access decision, The expectation is these rules (projections and the row filters) are correctly applied and enforced by the client, which brings an implicit requirement to have a trusted partner, establishing trust between callers engine and catalog is not scope of this proposal as its totally up to the catalog on how its established via OAuth Delegation Flow or mTLS.

The ReadRestrictions returns back projections which are modeled as Term and the row filters are modeled as Expression

This is based on my current understanding of community concencus.

Details of the community syncs can be reference here - https://www.youtube.com/watch?v=RRyohCUDnME

Future Extension

To support complex stuff like mapping tables (joins) or dialect specific SQL or complex policies, the Expressions and Terms can reference Iceberg UDFs (https://lists.apache.org/thread/rvy00kvgj1ybtond1v46t3bkv06v0jd0), which is currently being discussed in the community, once iceberg UDFs are defined we can enhance column projections and row filters to reference UDFs to handle these scenarios.

Additional consideration

  • clients expression handling capability : lets say we introduce a new expression or a transform in the IRC spec, and the catalog returns this new expression, but the client is running on a lower version so it doesn't understand this newly added expression, IMHO its fine if the client fails in interpreting / parsing the expression as it doesn't understands it, open to broader feedback, (similar discussion in past for spark community (here))

  • Schema evolution cases, consider a column got renamed, essentially the DDM and RAP should be Binded, having the field-id of the column they are referring to additionally sent to the client to correctly apply the read restrictions consder one had a filter on column A (when the policy was attached) which got renamed to column B later when reading the latest version of the schema reader could essentially apply filter on B essentially adhering to the column projection rule - https://iceberg.apache.org/spec/#column-projection

Acknowledgement

Many thanks to the entire Iceberg community for the extensive discussions and invaluable feedback over the years, which have been instrumental in shaping this proposal into its current form.

Issues
dev lists

[1] https://lists.apache.org/thread/4swop72zgcr8rrmwvb51rlk0vnb8joyz
[2] https://lists.apache.org/thread/8t2zh9nchklm4zwjj89vnq9fg9wv45o4

docs

[1] https://docs.google.com/document/d/14nmuxxfzQsYo59o0Fbpb-pxOlzS6bVtduL8P8pwKZ6U/edit?tab=t.0#bookmark=id.eekzk8xl6uo
[2] https://docs.google.com/document/d/108Y0E8XsZi91x-UY0_aHLEbmXDNmxmS5BnDjunEKvTM/edit?tab=t.0

syncs

[1] https://www.youtube.com/watch?v=RRyohCUDnME

@singhpk234 singhpk234 force-pushed the feature/load-table-return-policy branch from 1234341 to 6422478 Compare August 20, 2025 21:41
@singhpk234
Copy link
Contributor Author

singhpk234 commented Aug 20, 2025

@singhpk234 singhpk234 force-pushed the feature/load-table-return-policy branch from 9c1b305 to 6c5b955 Compare August 21, 2025 21:36
@singhpk234 singhpk234 changed the title [SPEC] Add FGAC enforcement instructions as part of loadTable [SPEC] Add finer grained read restrictions as part of loadTable Aug 22, 2025
@singhpk234
Copy link
Contributor Author

singhpk234 commented Sep 17, 2025

Re: Schema evolution cases
There was a general alignment to use boundedReference, working on incorporating that in the feedback

@singhpk234 singhpk234 added the Specification Issues that may introduce spec changes. label Sep 19, 2025
@singhpk234
Copy link
Contributor Author

There was a general alignment to use boundedReference, working on incorporating that in the feedback

Update - actively trying a POC for this in my fork (PR : singhpk234#270 if folks are interested to give some early feedback)

@singhpk234 singhpk234 marked this pull request as ready for review September 26, 2025 23:11

MaskHashSha256:
type: object
description: Mask the data of the column by apply SHA256 hash algorithm. Engines are free to use their own implementation of SHA256.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either this should be any cryptographic hash algorithm, or this should specify how to calculate values so they are consistent across engines. I don't really care which one, but if we are specifying details then we should specify all of them.

Copy link
Contributor Author

@singhpk234 singhpk234 Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rephrased it with the spec definition of sha256 please let me know if this reads better ?

2. Apply the SHA-256 algorithm as specified in NIST FIPS 180-4.
3. Convert the resulting 32-byte digest to a 64-character lowercase hexadecimal string.
ReplaceWithNull:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the void transform...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out, i am not sure why we don't specify the void transform in rest spec, let me dig some historical context if its just a miss, will incorporate this accordingly

Transform:
type: string
example:
- "identity"
- "year"
- "month"
- "day"
- "hour"
- "bucket[256]"
- "truncate[16]"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added : #14778

MaskHashSha256:
description: |
Mask the data of the column by applying SHA-256.
The input must be UTF-8 encoded bytes of the column value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the type is binary (not all binary strings can produce a valid UTF-8 string)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do have the following restriction additionally "The data type of the projected column MUST match the data type defined for the transform result" is it still applicable.

@sfc-gh-prsingh sfc-gh-prsingh force-pushed the feature/load-table-return-policy branch from 52b08b2 to 8a96996 Compare December 14, 2025 15:29
- field-id
- action

Action:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action is too generic in this context. maybe name it like Masking?

also action could be optional, right? If only projection is needed (without any masking)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action is too generic in this context. maybe name it like Masking?

Its a bit intentional to name it Action, idea to use them later in expression too, presently Action would suggest on needs to do this, ApplyTransform can be action to apply existing / predefined transforms in iceberg.

also action could be optional, right?

Agree, we can project it as it, no need to wrap it around identity transform, i removed this and added a note on what does a projection without action means.

MaskHashSha256:
description: |
Mask the data of the column by applying SHA-256.
The input must be UTF-8 encoded bytes of the column value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we need to say UTF-8 encoded bytes. is it applicable to binary or number types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it applicable to binary or number types?

it would be applicable i think for the binary type but not for the numeric type like int | long since the requirement is the input type should be the same as output type.

@sfc-gh-prsingh sfc-gh-prsingh force-pushed the feature/load-table-return-policy branch from 4cbf604 to 38e45da Compare January 1, 2026 03:18
@sfc-gh-prsingh sfc-gh-prsingh force-pushed the feature/load-table-return-policy branch from 38e45da to 008693b Compare January 1, 2026 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OPENAPI Specification Issues that may introduce spec changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.