Support Questions

Find answers, ask questions, and share your expertise

FileSystem provider not found (s3, Kosmos)

avatar
New Contributor

Hi,

 

we recently upgraded our cloudera cluster from 4.5.x to 5.4.3. I upgraded the clients as well and followed the maven client documentation of cloudera 5. I just included

the hadoop-client.jar in version 2.6.0-mr1-cdh5.4.3. Everything worked fine except the calls to org.apache.hadoop.fs.FileSystem.get() which fail with an exception

that FileSystem Providers like S3FileSystem, KosmosFileSystem etc are not available. I noticed that these classes were included in the hadoop-client 2.0.0-mr1-cdh4.5.0

(hadoop-common) but not longer in the 5.x.x versions. Some moved to different jars (S3FileSystem to hadoop-aws) other like KosmosFileSystem i could not find

at all. If I mix 4.5.x and 5.4.x jars its starting to get messy (what a suprise :-))

 

In general i am wondering why i have to provide these classes if i dont use them.

 

I looked in the hadoop config files (core-site, core-default, hdfs-site) for a property which FileSystems providers to load, but was not able to find one.

 

So in the end I have 2 questions.

 

1. Do I really have to provide all FileSystem Provider classes and if so how to find out where for example KosmosFileSystem is located ?

2. How to tell FileSystem class which providers to support?

 

In general what is the best practice to have a client which uses FileSystem.get() to access the required file system?

 

Im looking forward for any help.

 

Regards,

 

Lars

1 ACCEPTED SOLUTION

avatar
New Contributor

Hi,

 

I figured it out and it was a transitive inhouse dependency which defined these FileSystem impl classes in its META-INF/services/org.apache.hadoop.fs.FileSystem.

So sorry for the topic ;-).

 

Lars

View solution in original post

2 REPLIES 2

avatar
New Contributor

I still could not fix the issue but i digged a bit more in the code.

 

So ServiceProvider loads all implementations of a given class which would be org.apache.hadoop.fs.FileSystem - to know which FileSystem classes to load it looks

in a file located at META-INF/services/org.apache.hadoop.fs.FileSystem in each referenced jar.

 

hadoop-client 2.6.0-mr1-cdh5.4.4 includes the following hadoop jars which provide a file META-INF/services/org.apache.hadoop.fs.FileSystem:

 

1. hadoop-common-2.6.0-cdh5.4.4.jar

 

org.apache.hadoop.fs.LocalFileSystem
org.apache.hadoop.fs.viewfs.ViewFileSystem
org.apache.hadoop.fs.ftp.FTPFileSystem
org.apache.hadoop.fs.HarFileSystem

 

2. hadoop-hdfs-2.6.0-cdh5.4.4.jar

 

org.apache.hadoop.hdfs.DistributedFileSystem
org.apache.hadoop.hdfs.web.HftpFileSystem
org.apache.hadoop.hdfs.web.HsftpFileSystem
org.apache.hadoop.hdfs.web.WebHdfsFileSystem
org.apache.hadoop.hdfs.web.SWebHdfsFileSystem

 

I cant find more services config files in the context, but still FileSystem.get() tries to load org.apache.hadoop.fs.s3.S3FileSystem. Why is this happening?

Where does this information comes from. If this is a general issue everyone using cloudera 5.4.x client jars should have this problem which i somehow can not image.

So Im still thinking there is a general error in my setup - I just dont know which one.

 

Regards,

 

Lars

avatar
New Contributor

Hi,

 

I figured it out and it was a transitive inhouse dependency which defined these FileSystem impl classes in its META-INF/services/org.apache.hadoop.fs.FileSystem.

So sorry for the topic ;-).

 

Lars