Member since
06-22-2015
3
Posts
0
Kudos Received
0
Solutions
06-29-2015
08:56 AM
Thanks for the explanation. And thanks for tolerating me extending the original question. This issue can be closed.
... View more
06-23-2015
08:34 AM
Thank you for the excellent clarification. In our most typical use case, we submit a YARN application from a machine that is outside of the cluster. Thus the Client process runs on an external machine. Once the submitted YARN app completes our Client attempts to fetch the aggregated logs. Before attempting to fetch the aggregated logs, I had included a check to see if log aggregation was enabled simply to save time. I now believe that I should remove the check for log aggregation being enabled since the property is not a client-side property. So the explanation answers my initial question. A related question: Is there a way for a Client process running on an external machine to check if log aggregation has been completed? Thanks
... View more
06-22-2015
03:28 PM
Using Cloudera Manager I can set property "yarn.log-aggregation-enable" to "true". I can then run "Deploy Client Configuration" from Cloudera Manager. However, if I then run "hadoop classpath" or "yarn classpath", the Hadoop configuration directory, which is typically the first entry in the classpath, does not include an updated yarn-site.xml with "yarn.log-aggregation-enable" set to "true". Instead, it has the original yarn-site.xml which has no "yarn.log-aggregation-enable" property in it. Typically the first entry in the classpath is "/etc/hadoop/conf" from "hadoop classpath" or "yarn classpath". In contrast, if I run a YARN application which starts a Java task, I can print system property "java.class.path" and the first entry is a directory that does contain an updated yarn-site.xml with the property set with the value "true". For example, instead of "/etc/hadoop/conf" I see in one task the first directory is "/var/run/cloudera-scm-agent/process/840-yarn-NODEMANAGER". And in fact there is an environment variable, HADOOP_CONF_DIR, which points to the correct Hadoop config dir. But this directory, the one in /var/run/cloudera-scm-agent, is not included in "hadoop classpath" or "yarn classpath". In our application, we need to get the correct Hadoop config dir without running an YARN task. Even if I create a small Java program that prints environment variable HADOOP_CONF_DIR, or system property "java.class.path" and run it with "hadoop jar", I do not get the correct results. How do I get the correct Hadoop configuration directory without running a YARN job? Thanks
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN
-
Cloudera Manager