use FileSystem.rename(from,to,Rename.NONE) so that tmp dirs from repl…#3650
use FileSystem.rename(from,to,Rename.NONE) so that tmp dirs from repl…#3650fjy merged 1 commit intoapache:masterfrom
Conversation
79e859d to
7808609
Compare
|
👍 |
There was a problem hiding this comment.
we can take Options.Rename as argument to make it more generic?
There was a problem hiding this comment.
my intention is not to create a generic FileSystem wrapper really unless necessary... we can do that when there is a need.
There was a problem hiding this comment.
classname looks generic, we can change the classname to avoid any confusion.
There was a problem hiding this comment.
@akashdw I updated the comments , hopefully that sets the tone for this class more clearly.
There was a problem hiding this comment.
Why an Object, this seems like it's basically a static method?
There was a problem hiding this comment.
yeah, static method would work too. Let me change to that.
…icating tasks are not moved to the segment directory created by first task
7808609 to
bc48336
Compare
|
👍 |
|
I'm 👍 on this, though I was also hoping that with the change to a static method, its home would move into the |
|
@himanshug plz backport |
…icating tasks are not moved to the segment directory created by first task (apache#3650)
|
@himanshug can you file an issue to fix this when the changes will be done from the hadoop side ? ideally will be nice to link it to the hadoop jira if there is any one open. |
|
in response to @cheddar , reason for the static method being in a separate class is to allow accessing the "protected" method from hadoop FileSystem class. That is why HadoopFsWrapper is put in "org.apache.hadoop.fs" package instead of usual "io.druid..." |
…icating tasks are not moved to the segment directory created by first task (apache#3650)
…icating tasks are not moved to the segment directory created by first task (apache#3650)
…mparing expression
so that tmp dirs from replicating tasks are not moved to the segment directory created by first task.
Hadoop's HDFS FileSystem.rename(from, to) implementation does not have the desired behavior. Instead of failing, it "moves" the temporary directory under the outDir resulting in many directories/files with duplicated data inside the outDir.
I spoke to some Hadoop folks internally and they are aware of this situation and FileSystem.rename(from,to,Options.Rename) would be made public in newer versions of hadoop.