Member since
08-18-2017
6
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3296 | 08-18-2017 06:24 PM |
11-29-2017
06:19 PM
@nkumar I'm interested in changing the default filesystem for the entire HDP to NFS, can you please share this?
... View more
09-10-2017
09:23 PM
@Geoffrey Shelton Okot you're absolutely right! Thank you a lot. I was struggling to understand why I was loosing containers the first time I used "/fast_nfs/yarn/local (was getting "file not found" error messages) and then the tiredness made me think of using hdfs:/// instead...and I ended up sending this dumb question. My initial problem was caused by the fact that I was using the same NFS filesystem and path for all the nodes. After mounting /fast_nfs/nodeXXX/ on each node (where nodeXXX contains the yarn/local subdir) and adjusting permissions it worked perfectly (note that node1 gets /fast_nfs/node001 mounted as /scratch, node2 gets /fast_nfs/node002 mounted as /scratch and so on, so I'm using /fast_nfs as the NFS filesystem and mounting subdirs for each node..and defined yarn.nodemanager.local-dir as "/scratch/yarn/local")
... View more
09-10-2017
01:24 AM
Hi, I'm trying to change the "yarn.nodemanager.local-dirs" to point to "file:///fast_nfs/yarn/local". This is indeed a high-performance NFS mount-point that all the nodes in my cluster have. When I try to change it in Ambari I can't and the message "Must be a slash or drive at the start, and must not contain white spaces" is displayed. If I manually change the /etc/hadoop/conf/yarn-site.xml in all the nodes, after restarting YARN the "file:///" is removed from that option. I want to have all the shuffle happening in my high-performance NFS array instead of in HDFS. How can I change this behaviour in HDP?
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
08-18-2017
06:24 PM
1 Kudo
That should be doable. I have been performing some tests with a high-performance enterprise NFSv3 storage and Spark and it worked like a charm. I still kept an HDFS filesystem to keep logs and historical data (as a kind of tier-2) and used the high-performance NFS storage for the tier-1 datasets that needed more performance and lower response times. Ironically I found out that this NFS storage solution NFS performed similar or slightly better than HDFS when it comes to massive reads but clearly outperformed HDFS in writes, specially when the jobs had a lot of shuffle and spill to disk. The key thing to use an external and high-performance NFS storage is to make sure all the nodes in the cluster have a persistent mount to the NFS filesystem and all of them use the same mountpoint. When you submit your Spark jobs you just use instead "file:///", for example: "file:///mnt_bigdata/datasets/x". The great questions here are: (1) Is Hortonworks supporting this? (2) Is there any kind of generic NFS integration/deployment/best-practice guide? (3) Is there a procedure to completely move the entire cluster services and resources file dependencies out from HDFS to NFS ?
... View more