Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar

There are 2 different ways of accessing HDFS over http.

Using WebHDFS

http://<active-namenode-server>:<namenode-port>/webhdfs/v1/<file-path>?op=OPEN

Using HttpFs

http://<hadoop-httpfs-server>:<httpfs-port>/webhdfs/v1/<file-path>?op=OPEN

WebHDFS:

Pros:

  • Built-in with default Hadoop installation
  • Efficient as load is streamed from each data node

Cons:

  • Does not work if high availability is enabled on cluster, Active namenode needs to be specified to use webHdfs

HttpFs

Pros:

  • Works with HA enabled clusters.

Cons:

  • Needs to be installed as additional service.
  • Impacts performance because data is streamed from single node.
  • Creates single point of failure

Additional performance implications of webHDFS vs HttpFs

https://www.linkedin.com/today/post/article/20140717115238-176301000-accessing-hdfs-using-the-webhdf...

WebHDFS vs HttpFs Major difference between WebHDFS and HttpFs: WebHDFS needs access to all nodes of the cluster and when some data is read it is transmitted from that node directly, whereas in HttpFs, a singe node will act similar to a "gateway" and will be a single point of data transfer to the client node. So, HttpFs could be choked during a large file transfer but the good thing is that we are minimizing the footprint required to access HDFS.

19,542 Views