Support Questions
Find answers, ask questions, and share your expertise

WebHDFS Performance

Solved Go to solution
Highlighted

WebHDFS Performance

New Contributor

Could anyone share the performance differences between Webhdfs and Native Java clients. We are creating a webservice end point to ingest an attachment on to HDFS. The files are typically in <10 MB range.

Found

http://randomlydistributed.blogspot.com/2012/01/webhdfs-performance.html

http://wittykeegan.blogspot.com/2013/10/webhdfs-vs-native-performance.html

and my test results nearly match with second link. However wanted to see if any benchmark studies exist.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: WebHDFS Performance

@Srinivasan Hariharan

I am sure that you are aware already that WebHDFS concept is based on HTTP operations like GET, PUT, POST and DELETE. There you encounter performance implications due to the use of the HTTP server, Jetty. The FileSystem Shell API is a java application that uses java FileSystem class to provide FileSystem operations. FileSystem Shell API creates RPC connection for the operations.

Here are some numbers, but this is not a serious benchmarking study. I am not surprised to see the results for <10 MB files. That's what I expect to see. You can run the test for yourself. If that is the size of your files, then <10 MB should be fine. From my past experience, performance was a concern for large files, visible from 1 GB and higher.

http://wittykeegan.blogspot.com/2013/10/webhdfs-vs-native-performance.html

I'm checking for newer in Hortonworks docs and will post the link, if found.

If this is a reasonable response, please vote it or accept it as a best answer.

View solution in original post

4 REPLIES 4
Highlighted

Re: WebHDFS Performance

@Srinivasan Hariharan

I am sure that you are aware already that WebHDFS concept is based on HTTP operations like GET, PUT, POST and DELETE. There you encounter performance implications due to the use of the HTTP server, Jetty. The FileSystem Shell API is a java application that uses java FileSystem class to provide FileSystem operations. FileSystem Shell API creates RPC connection for the operations.

Here are some numbers, but this is not a serious benchmarking study. I am not surprised to see the results for <10 MB files. That's what I expect to see. You can run the test for yourself. If that is the size of your files, then <10 MB should be fine. From my past experience, performance was a concern for large files, visible from 1 GB and higher.

http://wittykeegan.blogspot.com/2013/10/webhdfs-vs-native-performance.html

I'm checking for newer in Hortonworks docs and will post the link, if found.

If this is a reasonable response, please vote it or accept it as a best answer.

View solution in original post

Highlighted

Re: WebHDFS Performance

New Contributor

Agree to your answer and your are spot on on the HTTP server performance implications. WebHDFS is really temping given that we can expose HDFS on a browser with minimal coding ad ability to integrate to non java clients as well. Do share if you get your hands on some benchmark performance numbers. Thanks!!!

Re: WebHDFS Performance

Explorer

I will give you a qualitative answer- Ambari (UI) uses WebHDFS and it is designed for scale and performance (vs. httpfs). In future, we will also look into enabling WebHDFS to seamlessly handle Name Node failover scenarios so that the apps dependent on WebHDFS does not have to keep track.

Highlighted

Re: WebHDFS Performance

New Contributor

Thanks for sharing your thoughts and direction of evolution. Agree WebHDFS is way better performant than httpfs