Fix BUG "No FileSystem for scheme: hdfs" for hdfs-srorage-extension#1022
Fix BUG "No FileSystem for scheme: hdfs" for hdfs-srorage-extension#1022haoch wants to merge 2 commits intoapache:masterfrom
Conversation
… FileSystem for scheme: hdfs' while loading hadoop-hdfs dependency
There was a problem hiding this comment.
I'm looking around the docs for hadoop and I cannot find that in 2.x configs:
https://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml (or similar)
That was a setting in 1.x but is it still valid in 2.x?
Also, there is no reason someone couldn't use https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/LocalFileSystem.html as the impl
There was a problem hiding this comment.
First of all, thanks very much for your quick response, Charles!
-
Firstly, it does work for hadoop 2.x (for us, it's
hadoop-2.4.0) too and I've tested with Druid in our environment. When I added ahdfs-site.xmlinto druid class path (sayconfig/realtime/hdfs-site.xml) and put such settings in it,<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.file.impl</name> <value>org.apache.hadoop.fs.LocalFileSystem</value> <description>The FileSystem for file: uris.</description> </property> <property> <name>fs.hdfs.impl</name> <value>org.apache.hadoop.hdfs.DistributedFileSystem</value> <description>The FileSystem for hdfs: uris.</description> </property> </configuration>the exception changed to
org.apache.hadoop.hdfs.DistributedFileSystem not foundas expected , which should answer your first concern:2015-01-09 05:57:15,042 ERROR [datanode_sherlock-2014-09-11T22:00:00.000Z-persist-n-merge] io.druid.segment.realtime.plumber.RealtimePlumber - Failed to persist merged index[datanode_sherlock]: {class=io.druid.segment.realtime.plumber.RealtimePlumber, exceptionType=class java.lang.RuntimeException, exceptionMessage=java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found, interval=2014-09-11T22:00:00.000Z/2014-09-11T23:00:00.000Z} java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1882) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2298) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2311) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2350) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2332) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at io.druid.storage.hdfs.HdfsDataSegmentPusher.push(HdfsDataSegmentPusher.java:75) at io.druid.segment.realtime.plumber.RealtimePlumber$4.doRun(RealtimePlumber.java:356) at io.druid.common.guava.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1788) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880) ... 13 more -
Secondly,
org.apache.hadoop.fs.LocalFileSystemis implemented inhadoop-commonwhich should be correctly loaded because it has been explicitly referred in druid's code in HdfsStorageDruidModule.java#L32, so that theLocalFileSystemis always available by default.
|
Having two impls in the service discovery path shouldn't be the root cause, that should simply let the service loader know there are two impls. If having two service impls is preventing HDFS from parsing the URIs correctly that sounds like a HDFS bug. |
|
Druid purposefully tries to disconnect itself from any particular Hadoop version. With 0.x/1.x still in wide use, and 2.x very popular, and 3.x actively moving along, we need to do everything we can to make sure that the solution is easily used in whatever version the users decide to have on the backend. As such, I think the |
|
To continue with my comments replied in line, druid users are really appreciating for the convenient extension loader mechanism in druid, but the bug just makes them confused. I'm not sure but I think it may be caused by the customized class loader in Druid rather than about HDFS itself, because the Surely, I also agree with the opinion about disconnecting Druid with any particular Hadoop version. So i think we have two options here:
|
|
@haoch : Just FYI, there is a discussion about how how to handle dependencies going forward, which is part of why this PR isn't getting traction yet. |
8b0ec82 to
d05032b
Compare
|
Thanks @drcrallen |
|
@haoch, are you having this problem when you try to make a single self contained jar? I'm asking because in that case, I think it makes more sense to concatenate the services files rather than hard-coding fs impls in Druid code. Hard-coding the impls in Druid wouldn't help for other FS types (like S3) unless we add them all, and the strategy of adding these things in the code would tie us more closely to a particular version of hadoop. Fwiw, what I've seen work for self contained jars in the past is using the maven shade plugin with a configuration like: <plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>selfcontained</shadedClassifierName>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.hadoop.fs.FileSystem</resource>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>This will concatenate the listed services files together so different files from different jars don't clobber each other. There may be something equivalent for the assembly plugin. |
|
looks like a genuine problem. however, it is more of an extension mechanism problem than the code being fixed here which is prone to failure in various situations requiring other FileSystem schemes than hdfs and local. There is some work happening to remove dynamic loading of extensions and that should fix this issue eventually. In the meantime, I would recommend just putting "hadoop classpath" manually on the classpath and that is what we [and most people] do. |
|
actually, I created a bug for hdfs as well for using correct classloader https://issues.apache.org/jira/browse/HDFS-8750 |
|
I think these threads are also related: |
|
This issue also exists when using the Druid Tranquility library (https://github.com/druid-io/tranquility).
|
|
My successful workaround to Druid 0.8.0 by patching the druid-hdfs-storage extension:
Full patch contents: patch.diff |
|
@himanshug @haoch I merged the changes in #1721 (https://github.com/druid-io/druid/pull/1721/files) into the druid release tagged "druid-0.8.0" and it seemed to also address this issue. |
|
@mark1900 thanks for testing the patch. |
|
@himanshug can we fix merge conflicts? |
|
@himanshug @haoch I merged the changes in #1721 (https://github.com/druid-io/druid/pull/1721/files) into the druid release tagged "druid-0.8.1" and it seems that this issue still occurs. |
|
Issue seems to be resolved in the latest Druid 0.8.2 release: http://static.druid.io/artifacts/releases/druid-0.8.2-bin.tar.gz |
Problem
While starting realtime node with hdfs as deep storage (i.e. hdfs-storage-extenstion), log as following shows that it should already loaded hadoop-hdfs correctly
But we got IOException about "No FileSystem for scheme: hdfs" while it tried to persist segments onto hdfs which means
hadoop-hdfs( classorg.apache.hadoop.hdfs.DistributedFileSystemin fact) should not been loaded correctly.In fact, lots of other Druid users rather than us are also being confused by the same bug
Root Cause
In fact, this is a typical case of the
maven-assemblyplugin breaking things in hadoop hdfs.The root cause is because differents JARs (
hadoop-commonsforLocalFileSystem,hadoop-hdfsforDistributedFileSystem) each contain a different file calledorg.apache.hadoop.fs.FileSystemin theirMETA-INFO/servicesdirectory. This file lists the canonical classnames of the filesystem implementations they want to declare (This is called aService Provider Interface, see org.apache.hadoop.FileSystem line L2591). Druid module management system seems to rely on customized ServiceProviderInterface too, right ? When we use maven-assembly, all META-INFO/services/org.apache.hadoop.fs.FileSystem overwrite each-other. Only one of these files remains (the last one that was added). In this case, the Filesystem list from hadoop-commons overwrites the list from hadoop-hdfs, so DistributedFileSystem was no longer declared.Solution
As a quick solution, the druid users caught with the same problems may start druid nodes with
hadoop classpathor copy all hadoop-hdfs related jars' path to the end of classpath like:To completely fixed the bug, we can explicitly refer to
DistributedFileSysteminhdfs-storage-extensioncode