Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

Comments

TAJO-1905: Insert clause to partitioned table fails on S3#959

Closed
blrunner wants to merge 19 commits intoapache:masterfrom
blrunner:TAJO-1905
Closed

TAJO-1905: Insert clause to partitioned table fails on S3#959
blrunner wants to merge 19 commits intoapache:masterfrom
blrunner:TAJO-1905

Conversation

@blrunner
Copy link
Contributor

Currently, Tajo output committer works as following:

  • Each task write output to a temp directory.
  • FileTablespace::commitTable renames first successful task's temp directory to final destination.

But above approach will occurs FileNotFoundException because of eventual consistency of S3. To resolve it, I implemented output committer for S3 and the committer works as following:

  • Each task write output to local disk instead of S3 (in CTAS statement or INERT statement)
  • S3TableSpace::commitTable copies first successful task's temp directory to S3.

This PR depends on #952. CTAS statement and INSERT statement for partition table ran successfully with this PR. For the reference, I was inspired by Netflix integrating spark slide(http://www.slideshare.net/piaozhexiu/netflix-integrating-spark-at-petabyte-scale-53391704).

To resolve this issue basically, each task need to write output to final destination and we need to implement pluggable output committer. But this way looks like a long time work. I think that this PR may be an interim work for the pluggable output committer.

@blrunner blrunner closed this Feb 17, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant