02-03-2016 03:22 PM - edited 02-03-2016 03:24 PM
I'm getting this error when trying to access HBase data in an external cluster when running Spark on YARN in another cluster. But, when I run Spark in local mode, it works fine. We are on CDH 5.4.8. I read that YARN can only access HBase on the same cluster because YARN needs to access the underlying HFiles stored in HDFS. Is this true?
03-10-2016 03:52 PM
Found the root cause of why YARN is hanging and HDFS is being bombarded with data causing the File Descriptors to run high. It was Log Aggregation.
Log Aggregation is enabled by default. This means that all the Node Managers will log their tasks into a central location in HDFS. This is a good thing for debugging YARN Applications. But for all this time, the directory was misconfigured and set to a directory that does not exist. So, this resulted in all the nodes just spitting out error messages while still continuing to process tasks. Once we noticed these errors, we set the HDFS to the default directory, and the errors went away. In return, we got something even worse. A flood of log entries began overwhelming HDFS causing a major slowdown, sometimes halting, of data storage attempts. File Descriptors maxed out. And YARN, since it cannot log, suspended in a Pending state. The only course of action was to turn off Log Aggregation to make each Node Manager store its own logs locally.
I would like to use this feature, but I don't know how to use without causing this problem again.