Skip to content

Conversation

@sririshindra
Copy link
Contributor

@sririshindra sririshindra commented Feb 5, 2025

What changes were proposed in this pull request?

Add the caller context for calls from DRIVER to HDFS.

Why are the changes needed?

HDFS audit logs include the ability to add a "caller context". Spark already leverages this to set the yarn application id, job id, task id, etc. but only on executors. The caller context is left empty on the spark driver. With introductions of Iceberg we have seen multiple scenarios in which files in HDFS are accessed from the driver. But since the caller context is left empty our ability to forensically analyse any issues has diminished. This PR includes sets caller context from the driver as well.

Does this PR introduce any user-facing change?

Yes, hdfs audit logs will now have caller context for calls from driver.

How was this patch tested?

This patch was tested manually. After this change the hdfs audit logs now contain caller context from the driver.

2025-02-14 02:26:23,249 INFO FSNamesystem.audit: allowed=true   ugi=root (auth:SIMPLE)  ip=/192.168.97.4        cmd=getfileinfo src=/warehouse/sample   dst=null        perm=null       proto=rpc       callerContext=SPARK_DRIVER_application_1739496632907_0005
2025-02-14 02:26:23,265 INFO FSNamesystem.audit: allowed=true   ugi=root (auth:SIMPLE)  ip=/192.168.97.4        cmd=listStatus  src=/warehouse/sample   dst=null        perm=null       proto=rpc       callerContext=SPARK_DRIVER_application_1739496632907_0005
2025-02-14 02:26:25,519 INFO FSNamesystem.audit: allowed=true   ugi=root (auth:SIMPLE)  ip=/192.168.97.5        cmd=open        src=/warehouse/sample/part-00000-dd473344-76b1-4179-91ae-d15a8da4a888-c000      dst=null        perm=null       proto=rpc       callerContext=SPARK_TASK_application_1739496632907_0005_JId_0_SId_0_0_TId_0_0
2025-02-14 02:26:26,345 INFO FSNamesystem.audit: allowed=true   ugi=root (auth:SIMPLE)  ip=/192.168.97.5        cmd=open        src=/warehouse/sample/part-00000-dd473344-76b1-4179-91ae-d15a8da4a888-c000      dst=null        perm=null       proto=rpc       callerContext=SPARK_TASK_application_1739496632907_0005_JId_1_SId_1_0_TId_1_0

Was this patch authored or co-authored using generative AI tooling?

No

@sririshindra sririshindra changed the title [DRAFT] [SPARK-51095]. Include caller context for hdfs audit logs for calls from driver [SPARK-51095]. Include caller context for hdfs audit logs for calls from driver Feb 5, 2025
@sririshindra sririshindra changed the title [SPARK-51095]. Include caller context for hdfs audit logs for calls from driver [SPARK-51095][CORE][SQL] Include caller context for hdfs audit logs for calls from driver Feb 5, 2025
@sririshindra
Copy link
Contributor Author

@attilapiros Could you take a look at this PR. There are some test failures apparently due to some scala style failures. But I couldn't quite figure out what they are and also the modules that seem to be failing don't seem to have anything to do with this PR

@attilapiros
Copy link
Contributor

This must be:
https://github.com/sririshindra/spark/actions/runs/13162571866/job/36749883818#step:13:5783

[error] /__w/spark/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala:53:0: org.apache.spark.internal.config.APP_CALLER_CONTEXT is in wrong order relative to org.apache.spark.internal.config.Tests.IS_TESTING.

@attilapiros
Copy link
Contributor

The '__' in the PR description suggest the example was created with an older version of the PR, right?

I.e. the SPARK_DRIVER__application in this line:

2025-02-04 21:45:12,488 INFO FSNamesystem.audit: allowed=true	ugi=iceberg (auth:SIMPLE)	ip=/10.140.144.131	cmd=create	src=/warehouse/tablespace/external/hive/mwies.db/berg/metadata/00534-a14ccf3e-e50c-4a69-84e3-262389720ae3.metadata.json	dst=null	perm=iceberg:hive:rw-rw----	proto=rpc	callerContext=SPARK_DRIVER__application_1738349944329_0054

Copy link
Contributor

@cnauroth cnauroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 (non-binding) overall. I agree with other comments about removing the debug log line. Thank you, @sririshindra ! This will be useful!

@sririshindra
Copy link
Contributor Author

The '__' in the PR description suggest the example was created with an older version of the PR, right?

I.e. the SPARK_DRIVER__application in this line:

2025-02-04 21:45:12,488 INFO FSNamesystem.audit: allowed=true	ugi=iceberg (auth:SIMPLE)	ip=/10.140.144.131	cmd=create	src=/warehouse/tablespace/external/hive/mwies.db/berg/metadata/00534-a14ccf3e-e50c-4a69-84e3-262389720ae3.metadata.json	dst=null	perm=iceberg:hive:rw-rw----	proto=rpc	callerContext=SPARK_DRIVER__application_1738349944329_0054

Yes, it was from an older version. I reran the test with the latest version and updated the PR description with the new results.

@sririshindra sririshindra requested a review from pan3793 February 7, 2025 15:42
@sririshindra sririshindra marked this pull request as ready for review February 8, 2025 01:35
@github-actions github-actions bot removed the SQL label Feb 14, 2025
Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@cnauroth cnauroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that due to recent changes in #49893 and #49898 , it should be possible to write a reliable unit test for this. Otherwise, I am +1 (non-binding).

Thank you, @sririshindra .

@sririshindra
Copy link
Contributor Author

Note that due to recent changes in #49893 and #49898 , it should be possible to write a reliable unit test for this. Otherwise, I am +1 (non-binding).

Thanks @cnauroth . I now added a unit test as well.

@sririshindra
Copy link
Contributor Author

sririshindra commented Feb 16, 2025

@attilapiros , @dongjoon-hyun, @sunchao Can you please review this PR when you get a chance. Thanks.

Copy link
Contributor

@attilapiros attilapiros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants