Support Questions

Find answers, ask questions, and share your expertise

Why node manager is crashing when it's data directory is mounted as Fuse system on ubuntu

New Contributor

Hello Everyone,

I have my application configured to use Yarn Resource Manager, Hadoop Name Node and Samza Containers. All the data are written under /var/lib location (/var is a mounted drive). I want to validate my application for Disk latency fault. To validate disk latency for application, I want to mount the /var/lib as Fuse file system which will be help me in adding delays.

Here is what I have done to mount the fuse file system (mounting fuse file system has no issues).

  • Stopped all my application services.
  • Stopped Yarn Resource Manager, Node Manager, Kafka, Samza jobs processes (using the command: sudo service hadoop-yarn-resource-manager stop : We do have respective conf file in /etc/init folder).
  • Copied the contents of /var/lib to /var/myapp/lib (used cp -rp to preserve file permissions and ownership).
  • Mounted the fuse file system with mount point as "/var/lib" and data directory as "/var/myapp/lib/"
  • Started all the services.

After above mentioned steps, I have started all the services. All the applications started successfully except Node manager and Samza jobs. My observation is that, Node manager is crashing (starts and stops immediately). Samza jobs is expected to fail due to failure of Node Manager.


What's the issue here?

Following is the message from logs:

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; min-height: 14.0px}

# Problematic frame:

# C[]


+ date +%h %d %Y %H:%M:%S-%3N

+ echo [Jan 27 2019 06:53:18-872] Exiting yarn node manager...

INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: registered UNIX signal handlers for [TERM, HUP, INT]

INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService: Using state database at /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state for recovery

INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService$LeveldbLogger: Recovering log #36038



@Dinesh B

Did you update the entries in /etc/fstab ?

New Contributor

Thanks @Geoffrey Shelton Okot for the response,

I did NOT update /etc/fstab as I didn't want to retain the mount point across reboots. Is there any other reason why I should be updating /etc/fstab?




@Dinesh B

Then you might need to manually mount it for the session ONLY like reference fuse mount

sshfs username@hostname:/remote/directory/path /local/mount/point

If you don't mount it how will the OS know?

New Contributor

Hi @Geoffrey Shelton Okot,

As I pointed out in my 1st post, I have already mounted "/var/lib" as fuse mount. I could see the /var/lib mount as fuse mount on entering "df -lh" command. Also, I could do a successful 'ls /var/lib' directory after mount.

I don't think mount has any issues here as I could observe yarn resource manager service running successfully after the mount.

Please note: /var/lib is the parent directory where Yarn RM, NM, Samza, my applications are writing. Since, I have mounted /var/lib as the fuse mount, all the other applications could run with no issues.

Procedure I followed before I mount /var/lib as fuse mount point is: I have created a directory "/var/myapp/lib/". Stopped all the processes including yarn RM, Samza jobs, Hdfs NM. Copied all the files (cp -rfp to retain file ownership and permissions as it is) from /var/lib to "/var/myapp/lib/". Mounted the /var/lib pointing to the data directory "/var/myapp/lib/".

After the above said operation, I could see all the processes starting with no issues except HDFS NM and samza job processes.

The errors observed in the logs are quoted in my 1st post in this question.

I wanted to understand, if the procedure I'm following is correct as the HDFS NM process is crashing with the message "recovering". Wondering if I'm missing any steps.

New Contributor

I am the user of ubuntu os so far and this question answer i don't have honestly and thanks for sharing this solution with us. [Spurious hyperlink removed] assured me to get this crashing issue solved.