Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@
import io.druid.storage.hdfs.tasklog.HdfsTaskLogs;
import io.druid.storage.hdfs.tasklog.HdfsTaskLogsConfig;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.LocalFileSystem;
import org.apache.hadoop.hdfs.DistributedFileSystem;

import java.util.List;
import java.util.Properties;
Expand Down Expand Up @@ -60,6 +62,11 @@ public void configure(Binder binder)
Binders.dataSegmentKillerBinder(binder).addBinding("hdfs").to(HdfsDataSegmentKiller.class).in(LazySingleton.class);

final Configuration conf = new Configuration();

// Walk around a typical case that "maven-assembly" causes bug about "No FileSystem for scheme: hdfs" while loading hadoop-hdfs dependency
conf.set("fs.hdfs.impl", DistributedFileSystem.class.getName());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking around the docs for hadoop and I cannot find that in 2.x configs:

https://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml (or similar)

That was a setting in 1.x but is it still valid in 2.x?

Also, there is no reason someone couldn't use https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/LocalFileSystem.html as the impl

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thanks very much for your quick response, Charles!

  • Firstly, it does work for hadoop 2.x (for us, it's hadoop-2.4.0) too and I've tested with Druid in our environment. When I added a hdfs-site.xml into druid class path (say config/realtime/hdfs-site.xml ) and put such settings in it,

        <?xml version="1.0" encoding="UTF-8"?>
        <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
        <!--
          Licensed under the Apache License, Version 2.0 (the "License");
          you may not use this file except in compliance with the License.
          You may obtain a copy of the License at
    
            http://www.apache.org/licenses/LICENSE-2.0
    
          Unless required by applicable law or agreed to in writing, software
          distributed under the License is distributed on an "AS IS" BASIS,
          WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          See the License for the specific language governing permissions and
          limitations under the License. See accompanying LICENSE file.
        -->
    
        <!-- Put site-specific property overrides in this file. -->
    
        <configuration>
            <property>
               <name>fs.file.impl</name>
               <value>org.apache.hadoop.fs.LocalFileSystem</value>
               <description>The FileSystem for file: uris.</description>
            </property>
    
            <property>
               <name>fs.hdfs.impl</name>
               <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
               <description>The FileSystem for hdfs: uris.</description>
            </property>
        </configuration>
    

    the exception changed to org.apache.hadoop.hdfs.DistributedFileSystem not found as expected , which should answer your first concern:

      2015-01-09 05:57:15,042 ERROR [datanode_sherlock-2014-09-11T22:00:00.000Z-persist-n-merge] io.druid.segment.realtime.plumber.RealtimePlumber - Failed to persist merged index[datanode_sherlock]: {class=io.druid.segment.realtime.plumber.RealtimePlumber, exceptionType=class java.lang.RuntimeException, exceptionMessage=java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found, interval=2014-09-11T22:00:00.000Z/2014-09-11T23:00:00.000Z}
      java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found
              at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1882)
              at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2298)
              at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2311)
              at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90)
              at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2350)
              at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2332)
              at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369)
              at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
              at io.druid.storage.hdfs.HdfsDataSegmentPusher.push(HdfsDataSegmentPusher.java:75)
              at io.druid.segment.realtime.plumber.RealtimePlumber$4.doRun(RealtimePlumber.java:356)
              at io.druid.common.guava.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:42)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found
              at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1788)
              at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
              ... 13 more
    
  • Secondly, org.apache.hadoop.fs.LocalFileSystem is implemented in hadoop-common which should be correctly loaded because it has been explicitly referred in druid's code in HdfsStorageDruidModule.java#L32, so that the LocalFileSystem is always available by default.

conf.set("fs.file.impl", LocalFileSystem.class.getName());

if (props != null) {
for (String propName : System.getProperties().stringPropertyNames()) {
if (propName.startsWith("hadoop.")) {
Expand Down