Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to Install Hortonworks entire ecosystem without HDFS?

Solved Go to solution
Highlighted

How to Install Hortonworks entire ecosystem without HDFS?

New Contributor

I am planning to install Hortonworks cluster with YARN, TEZ, HIVE, MapReduce, Pig, Zookeeper, Spark etc without HDFS. I am planning to install on NFS storage. Any pointers would be great. Can this be done using Ambari?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to Install Hortonworks entire ecosystem without HDFS?

New Contributor

That should be doable. I have been performing some tests with a high-performance enterprise NFSv3 storage and Spark and it worked like a charm. I still kept an HDFS filesystem to keep logs and historical data (as a kind of tier-2) and used the high-performance NFS storage for the tier-1 datasets that needed more performance and lower response times. Ironically I found out that this NFS storage solution NFS performed similar or slightly better than HDFS when it comes to massive reads but clearly outperformed HDFS in writes, specially when the jobs had a lot of shuffle and spill to disk.

The key thing to use an external and high-performance NFS storage is to make sure all the nodes in the cluster have a persistent mount to the NFS filesystem and all of them use the same mountpoint. When you submit your Spark jobs you just use instead "file:///", for example: "file:///mnt_bigdata/datasets/x".

The great questions here are:

(1) Is Hortonworks supporting this?

(2) Is there any kind of generic NFS integration/deployment/best-practice guide?

(3) Is there a procedure to completely move the entire cluster services and resources file dependencies out from HDFS to NFS ?

7 REPLIES 7

Re: How to Install Hortonworks entire ecosystem without HDFS?

Cloudera Employee
@Krishna S

Yes, this can be done. You can install all the required components using Ambari with HDFS. The default storage (default file system) can later be changed to NFS. This is a doable configuration, but can/t be posted here. You can email me for your specific requirement.

Re: How to Install Hortonworks entire ecosystem without HDFS?

New Contributor

@nkumar I'm interested in changing the default filesystem for the entire HDP to NFS, can you please share this?

Re: How to Install Hortonworks entire ecosystem without HDFS?

New Contributor

@nkumar I'm also interested in HDP on NFS instead of HDFS, could you please share what's need to be done?

Re: How to Install Hortonworks entire ecosystem without HDFS?

New Contributor

@nkumar It will be great ....Thank you very much , how can I reach you out?

Re: How to Install Hortonworks entire ecosystem without HDFS?

Cloudera Employee

Hi @Krishna S

Let me know your email id. I can directly email you there. Or if you are from Hortonworks, you can for sure find me via the hipchat.

Re: How to Install Hortonworks entire ecosystem without HDFS?

Super Guru

@Krishna S

To use these components without HDFS, you need a file system that supports Hadoop API. Some such systems are Amazon S3, WASB, EMC Isilon and a few others(these systems might not implement 100 percent of Hadoop API - please verify). you can also install Hadoop in standalone mode which does not use HDFS.

I am not sure NFS on its own supports Hadoop API but using Hadoop NFS gateway, you can mount HDFS as client's local file system. Here is a link on using this feature.

https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.htm

Re: How to Install Hortonworks entire ecosystem without HDFS?

New Contributor

That should be doable. I have been performing some tests with a high-performance enterprise NFSv3 storage and Spark and it worked like a charm. I still kept an HDFS filesystem to keep logs and historical data (as a kind of tier-2) and used the high-performance NFS storage for the tier-1 datasets that needed more performance and lower response times. Ironically I found out that this NFS storage solution NFS performed similar or slightly better than HDFS when it comes to massive reads but clearly outperformed HDFS in writes, specially when the jobs had a lot of shuffle and spill to disk.

The key thing to use an external and high-performance NFS storage is to make sure all the nodes in the cluster have a persistent mount to the NFS filesystem and all of them use the same mountpoint. When you submit your Spark jobs you just use instead "file:///", for example: "file:///mnt_bigdata/datasets/x".

The great questions here are:

(1) Is Hortonworks supporting this?

(2) Is there any kind of generic NFS integration/deployment/best-practice guide?

(3) Is there a procedure to completely move the entire cluster services and resources file dependencies out from HDFS to NFS ?