Skip to content

use FileSystem.rename(from,to,Rename.NONE) so that tmp dirs from repl…#3650

Merged
fjy merged 1 commit intoapache:masterfrom
himanshug:fix_hdfs_pusher
Nov 2, 2016
Merged

use FileSystem.rename(from,to,Rename.NONE) so that tmp dirs from repl…#3650
fjy merged 1 commit intoapache:masterfrom
himanshug:fix_hdfs_pusher

Conversation

@himanshug
Copy link
Copy Markdown
Contributor

@himanshug himanshug commented Nov 2, 2016

so that tmp dirs from replicating tasks are not moved to the segment directory created by first task.

Hadoop's HDFS FileSystem.rename(from, to) implementation does not have the desired behavior. Instead of failing, it "moves" the temporary directory under the outDir resulting in many directories/files with duplicated data inside the outDir.

I spoke to some Hadoop folks internally and they are aware of this situation and FileSystem.rename(from,to,Options.Rename) would be made public in newer versions of hadoop.

@himanshug himanshug added the Bug label Nov 2, 2016
@himanshug himanshug added this to the 0.9.2 milestone Nov 2, 2016
@pjain1
Copy link
Copy Markdown
Member

pjain1 commented Nov 2, 2016

👍

Copy link
Copy Markdown
Contributor

@akashdw akashdw Nov 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can take Options.Rename as argument to make it more generic?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my intention is not to create a generic FileSystem wrapper really unless necessary... we can do that when there is a need.

Copy link
Copy Markdown
Contributor

@akashdw akashdw Nov 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classname looks generic, we can change the classname to avoid any confusion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akashdw I updated the comments , hopefully that sets the tone for this class more clearly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why an Object, this seems like it's basically a static method?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, static method would work too. Let me change to that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

…icating tasks are not moved to the segment directory created by first task
@akashdw
Copy link
Copy Markdown
Contributor

akashdw commented Nov 2, 2016

👍

@cheddar
Copy link
Copy Markdown
Contributor

cheddar commented Nov 2, 2016

I'm 👍 on this, though I was also hoping that with the change to a static method, its home would move into the HdfsDataSegmentPusher, but I'm ok with it also being somewhere else, as it is now.

@fjy fjy merged commit 2362eff into apache:master Nov 2, 2016
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Nov 2, 2016

@himanshug plz backport

himanshug added a commit to himanshug/druid that referenced this pull request Nov 3, 2016
…icating tasks are not moved to the segment directory created by first task (apache#3650)
fjy pushed a commit that referenced this pull request Nov 3, 2016
…icating tasks are not moved to the segment directory created by first task (#3650) (#3653)
@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented Nov 3, 2016

@himanshug can you file an issue to fix this when the changes will be done from the hadoop side ? ideally will be nice to link it to the hadoop jira if there is any one open.

@himanshug
Copy link
Copy Markdown
Contributor Author

@b-slim #3654

@himanshug
Copy link
Copy Markdown
Contributor Author

himanshug commented Nov 4, 2016

in response to @cheddar , reason for the static method being in a separate class is to allow accessing the "protected" method from hadoop FileSystem class. That is why HadoopFsWrapper is put in "org.apache.hadoop.fs" package instead of usual "io.druid..."

fundead pushed a commit to fundead/druid that referenced this pull request Dec 7, 2016
…icating tasks are not moved to the segment directory created by first task (apache#3650)
@himanshug himanshug deleted the fix_hdfs_pusher branch January 3, 2017 16:23
dgolitsyn pushed a commit to metamx/druid that referenced this pull request Feb 14, 2017
…icating tasks are not moved to the segment directory created by first task (apache#3650)
@himanshug himanshug mentioned this pull request Jun 5, 2017
seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants