Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

FileSystem provider not found (s3, Kosmos)

avatar
New Contributor

Hi,

 

we recently upgraded our cloudera cluster from 4.5.x to 5.4.3. I upgraded the clients as well and followed the maven client documentation of cloudera 5. I just included

the hadoop-client.jar in version 2.6.0-mr1-cdh5.4.3. Everything worked fine except the calls to org.apache.hadoop.fs.FileSystem.get() which fail with an exception

that FileSystem Providers like S3FileSystem, KosmosFileSystem etc are not available. I noticed that these classes were included in the hadoop-client 2.0.0-mr1-cdh4.5.0

(hadoop-common) but not longer in the 5.x.x versions. Some moved to different jars (S3FileSystem to hadoop-aws) other like KosmosFileSystem i could not find

at all. If I mix 4.5.x and 5.4.x jars its starting to get messy (what a suprise :-))

 

In general i am wondering why i have to provide these classes if i dont use them.

 

I looked in the hadoop config files (core-site, core-default, hdfs-site) for a property which FileSystems providers to load, but was not able to find one.

 

So in the end I have 2 questions.

 

1. Do I really have to provide all FileSystem Provider classes and if so how to find out where for example KosmosFileSystem is located ?

2. How to tell FileSystem class which providers to support?

 

In general what is the best practice to have a client which uses FileSystem.get() to access the required file system?

 

Im looking forward for any help.

 

Regards,

 

Lars

1 ACCEPTED SOLUTION

avatar
New Contributor

Hi,

 

I figured it out and it was a transitive inhouse dependency which defined these FileSystem impl classes in its META-INF/services/org.apache.hadoop.fs.FileSystem.

So sorry for the topic ;-).

 

Lars

View solution in original post

2 REPLIES 2

avatar
New Contributor

I still could not fix the issue but i digged a bit more in the code.

 

So ServiceProvider loads all implementations of a given class which would be org.apache.hadoop.fs.FileSystem - to know which FileSystem classes to load it looks

in a file located at META-INF/services/org.apache.hadoop.fs.FileSystem in each referenced jar.

 

hadoop-client 2.6.0-mr1-cdh5.4.4 includes the following hadoop jars which provide a file META-INF/services/org.apache.hadoop.fs.FileSystem:

 

1. hadoop-common-2.6.0-cdh5.4.4.jar

 

org.apache.hadoop.fs.LocalFileSystem
org.apache.hadoop.fs.viewfs.ViewFileSystem
org.apache.hadoop.fs.ftp.FTPFileSystem
org.apache.hadoop.fs.HarFileSystem

 

2. hadoop-hdfs-2.6.0-cdh5.4.4.jar

 

org.apache.hadoop.hdfs.DistributedFileSystem
org.apache.hadoop.hdfs.web.HftpFileSystem
org.apache.hadoop.hdfs.web.HsftpFileSystem
org.apache.hadoop.hdfs.web.WebHdfsFileSystem
org.apache.hadoop.hdfs.web.SWebHdfsFileSystem

 

I cant find more services config files in the context, but still FileSystem.get() tries to load org.apache.hadoop.fs.s3.S3FileSystem. Why is this happening?

Where does this information comes from. If this is a general issue everyone using cloudera 5.4.x client jars should have this problem which i somehow can not image.

So Im still thinking there is a general error in my setup - I just dont know which one.

 

Regards,

 

Lars

avatar
New Contributor

Hi,

 

I figured it out and it was a transitive inhouse dependency which defined these FileSystem impl classes in its META-INF/services/org.apache.hadoop.fs.FileSystem.

So sorry for the topic ;-).

 

Lars