Created 07-20-2016 05:18 PM
Could anyone share the performance differences between Webhdfs and Native Java clients. We are creating a webservice end point to ingest an attachment on to HDFS. The files are typically in <10 MB range.
Found
http://randomlydistributed.blogspot.com/2012/01/webhdfs-performance.html
http://wittykeegan.blogspot.com/2013/10/webhdfs-vs-native-performance.html
and my test results nearly match with second link. However wanted to see if any benchmark studies exist.
Created 07-20-2016 07:01 PM
I am sure that you are aware already that WebHDFS concept is based on HTTP operations like GET, PUT, POST and DELETE. There you encounter performance implications due to the use of the HTTP server, Jetty. The FileSystem Shell API is a java application that uses java FileSystem class to provide FileSystem operations. FileSystem Shell API creates RPC connection for the operations.
Here are some numbers, but this is not a serious benchmarking study. I am not surprised to see the results for <10 MB files. That's what I expect to see. You can run the test for yourself. If that is the size of your files, then <10 MB should be fine. From my past experience, performance was a concern for large files, visible from 1 GB and higher.
http://wittykeegan.blogspot.com/2013/10/webhdfs-vs-native-performance.html
I'm checking for newer in Hortonworks docs and will post the link, if found.
If this is a reasonable response, please vote it or accept it as a best answer.
Created 07-20-2016 07:01 PM
I am sure that you are aware already that WebHDFS concept is based on HTTP operations like GET, PUT, POST and DELETE. There you encounter performance implications due to the use of the HTTP server, Jetty. The FileSystem Shell API is a java application that uses java FileSystem class to provide FileSystem operations. FileSystem Shell API creates RPC connection for the operations.
Here are some numbers, but this is not a serious benchmarking study. I am not surprised to see the results for <10 MB files. That's what I expect to see. You can run the test for yourself. If that is the size of your files, then <10 MB should be fine. From my past experience, performance was a concern for large files, visible from 1 GB and higher.
http://wittykeegan.blogspot.com/2013/10/webhdfs-vs-native-performance.html
I'm checking for newer in Hortonworks docs and will post the link, if found.
If this is a reasonable response, please vote it or accept it as a best answer.
Created 07-21-2016 03:38 AM
Agree to your answer and your are spot on on the HTTP server performance implications. WebHDFS is really temping given that we can expose HDFS on a browser with minimal coding ad ability to integrate to non java clients as well. Do share if you get your hands on some benchmark performance numbers. Thanks!!!
Created 07-20-2016 09:25 PM
I will give you a qualitative answer- Ambari (UI) uses WebHDFS and it is designed for scale and performance (vs. httpfs). In future, we will also look into enabling WebHDFS to seamlessly handle Name Node failover scenarios so that the apps dependent on WebHDFS does not have to keep track.
Created 07-21-2016 03:41 AM
Thanks for sharing your thoughts and direction of evolution. Agree WebHDFS is way better performant than httpfs