Skip to content

PoC _INFO files generation in S3 #37

@dk1844

Description

@dk1844

Enceladus expects to be able to generate _INFO files to be generated in the output directory alongside the spark output.

This feature is originally implemented using HDFS API, for AWS S3, we need to replicate the functionality for S3. Options are:

  • AWS SDK for S3 API (primary option)
  • using temp HDFS location and copying the _INFO file(s) over (fallback option)

The most prominent entry point should be:
AtumImplicits.SparkSessionWrapper(spark) and internally ControlFrameworkState.storeCurrentInfoFile

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions