Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What type of disk (RAID 1, RAID 0, etc) should be used for the following yarn directories:

avatar
Explorer
 
1 ACCEPTED SOLUTION

avatar
Expert Contributor

In short Mirror them RAID 1 or RAID 1/0...some more details below

Hey Martin YARN Timeline Service (YTS) would fall in the category of a management service which would reside on a management node, which would probably be on the same spindles as a partition off the OS drives and should be setup as mirrored pair RAID 1 or RAID 1/0. The YARN Timeline Server is actually a very important component as it keeps current and historical information about applications run on the cluster, thus the underlying levelDB storage location is just as important. Node managers logs should be on a partition of the OS drives which as well would be mirrored....mirroring drives for OS on both Data Nodes and Management nodes is pretty common...not mandatory on Data Nodes but one of those things where the cost delta for just mirroring the OS on the Data Nodes is not huge, so might as well just mirror them.

View solution in original post

5 REPLIES 5

avatar
Explorer

forgot to add the directories:

  1. yarn.nodemanager.log-dirs
  2. yarn.timeline-service.leveldb-timeline-store.path
  3. yarn.timeline-service.leveldb-state-store-path

avatar

@mcarillo the yarn.nodemanager.log-dirs is on the same mounts as your hadoop data directories. See

https://community.hortonworks.com/articles/1888/apache-tez-tuning-tips-solving-the-could-not-find.ht...

avatar
Expert Contributor

In short Mirror them RAID 1 or RAID 1/0...some more details below

Hey Martin YARN Timeline Service (YTS) would fall in the category of a management service which would reside on a management node, which would probably be on the same spindles as a partition off the OS drives and should be setup as mirrored pair RAID 1 or RAID 1/0. The YARN Timeline Server is actually a very important component as it keeps current and historical information about applications run on the cluster, thus the underlying levelDB storage location is just as important. Node managers logs should be on a partition of the OS drives which as well would be mirrored....mirroring drives for OS on both Data Nodes and Management nodes is pretty common...not mandatory on Data Nodes but one of those things where the cost delta for just mirroring the OS on the Data Nodes is not huge, so might as well just mirror them.

avatar

If you have an SSD in the node (or, oftentimes, a RAID 1 mirror) and it's large enough, YTS database would be a good candidate for putting on it.

avatar
Explorer

Great stuff Dan! Thanks for your help here.