Created on 07-28-2015 06:00 AM - edited 09-16-2022 02:36 AM
Hi,
we recently upgraded our cloudera cluster from 4.5.x to 5.4.3. I upgraded the clients as well and followed the maven client documentation of cloudera 5. I just included
the hadoop-client.jar in version 2.6.0-mr1-cdh5.4.3. Everything worked fine except the calls to org.apache.hadoop.fs.FileSystem.get() which fail with an exception
that FileSystem Providers like S3FileSystem, KosmosFileSystem etc are not available. I noticed that these classes were included in the hadoop-client 2.0.0-mr1-cdh4.5.0
(hadoop-common) but not longer in the 5.x.x versions. Some moved to different jars (S3FileSystem to hadoop-aws) other like KosmosFileSystem i could not find
at all. If I mix 4.5.x and 5.4.x jars its starting to get messy (what a suprise :-))
In general i am wondering why i have to provide these classes if i dont use them.
I looked in the hadoop config files (core-site, core-default, hdfs-site) for a property which FileSystems providers to load, but was not able to find one.
So in the end I have 2 questions.
1. Do I really have to provide all FileSystem Provider classes and if so how to find out where for example KosmosFileSystem is located ?
2. How to tell FileSystem class which providers to support?
In general what is the best practice to have a client which uses FileSystem.get() to access the required file system?
Im looking forward for any help.
Regards,
Lars
Created 08-03-2015 07:40 AM
Hi,
I figured it out and it was a transitive inhouse dependency which defined these FileSystem impl classes in its META-INF/services/org.apache.hadoop.fs.FileSystem.
So sorry for the topic ;-).
Lars
Created 08-03-2015 06:07 AM
I still could not fix the issue but i digged a bit more in the code.
So ServiceProvider loads all implementations of a given class which would be org.apache.hadoop.fs.FileSystem - to know which FileSystem classes to load it looks
in a file located at META-INF/services/org.apache.hadoop.fs.FileSystem in each referenced jar.
hadoop-client 2.6.0-mr1-cdh5.4.4 includes the following hadoop jars which provide a file META-INF/services/org.apache.hadoop.fs.FileSystem:
1. hadoop-common-2.6.0-cdh5.4.4.jar
org.apache.hadoop.fs.LocalFileSystem
org.apache.hadoop.fs.viewfs.ViewFileSystem
org.apache.hadoop.fs.ftp.FTPFileSystem
org.apache.hadoop.fs.HarFileSystem
2. hadoop-hdfs-2.6.0-cdh5.4.4.jar
org.apache.hadoop.hdfs.DistributedFileSystem
org.apache.hadoop.hdfs.web.HftpFileSystem
org.apache.hadoop.hdfs.web.HsftpFileSystem
org.apache.hadoop.hdfs.web.WebHdfsFileSystem
org.apache.hadoop.hdfs.web.SWebHdfsFileSystem
I cant find more services config files in the context, but still FileSystem.get() tries to load org.apache.hadoop.fs.s3.S3FileSystem. Why is this happening?
Where does this information comes from. If this is a general issue everyone using cloudera 5.4.x client jars should have this problem which i somehow can not image.
So Im still thinking there is a general error in my setup - I just dont know which one.
Regards,
Lars
Created 08-03-2015 07:40 AM
Hi,
I figured it out and it was a transitive inhouse dependency which defined these FileSystem impl classes in its META-INF/services/org.apache.hadoop.fs.FileSystem.
So sorry for the topic ;-).
Lars