Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

WebHDFS Performance

avatar

Could anyone share the performance differences between Webhdfs and Native Java clients. We are creating a webservice end point to ingest an attachment on to HDFS. The files are typically in <10 MB range.

Found

http://randomlydistributed.blogspot.com/2012/01/webhdfs-performance.html

http://wittykeegan.blogspot.com/2013/10/webhdfs-vs-native-performance.html

and my test results nearly match with second link. However wanted to see if any benchmark studies exist.

1 ACCEPTED SOLUTION

avatar
Super Guru

@Srinivasan Hariharan

I am sure that you are aware already that WebHDFS concept is based on HTTP operations like GET, PUT, POST and DELETE. There you encounter performance implications due to the use of the HTTP server, Jetty. The FileSystem Shell API is a java application that uses java FileSystem class to provide FileSystem operations. FileSystem Shell API creates RPC connection for the operations.

Here are some numbers, but this is not a serious benchmarking study. I am not surprised to see the results for <10 MB files. That's what I expect to see. You can run the test for yourself. If that is the size of your files, then <10 MB should be fine. From my past experience, performance was a concern for large files, visible from 1 GB and higher.

http://wittykeegan.blogspot.com/2013/10/webhdfs-vs-native-performance.html

I'm checking for newer in Hortonworks docs and will post the link, if found.

If this is a reasonable response, please vote it or accept it as a best answer.

View solution in original post

4 REPLIES 4

avatar
Super Guru

@Srinivasan Hariharan

I am sure that you are aware already that WebHDFS concept is based on HTTP operations like GET, PUT, POST and DELETE. There you encounter performance implications due to the use of the HTTP server, Jetty. The FileSystem Shell API is a java application that uses java FileSystem class to provide FileSystem operations. FileSystem Shell API creates RPC connection for the operations.

Here are some numbers, but this is not a serious benchmarking study. I am not surprised to see the results for <10 MB files. That's what I expect to see. You can run the test for yourself. If that is the size of your files, then <10 MB should be fine. From my past experience, performance was a concern for large files, visible from 1 GB and higher.

http://wittykeegan.blogspot.com/2013/10/webhdfs-vs-native-performance.html

I'm checking for newer in Hortonworks docs and will post the link, if found.

If this is a reasonable response, please vote it or accept it as a best answer.

avatar

Agree to your answer and your are spot on on the HTTP server performance implications. WebHDFS is really temping given that we can expose HDFS on a browser with minimal coding ad ability to integrate to non java clients as well. Do share if you get your hands on some benchmark performance numbers. Thanks!!!

avatar
Contributor

I will give you a qualitative answer- Ambari (UI) uses WebHDFS and it is designed for scale and performance (vs. httpfs). In future, we will also look into enabling WebHDFS to seamlessly handle Name Node failover scenarios so that the apps dependent on WebHDFS does not have to keep track.

avatar

Thanks for sharing your thoughts and direction of evolution. Agree WebHDFS is way better performant than httpfs