Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to locate logs for local jobs

avatar
Explorer

I was able to successfully setup Hive 1.2.1 with Hadoop 2.7.1, using yarn as the mapreduce framework. For debugging certain queries (am trying to attach a debugger and go through the operators), now I'm trying to use the local mode. i.e.

After switching to the local mode, I could see queries running, but I can't locate the logs for those jobs. Checked in the log folder but can't find any folder (I was looking for a syslog put inside a folder specific to a particular job), or a log starting with job id. How can I get those logs?

These are all the properties I have in mapred-site.xml

  <property>
    <name>mapreduce.framework.name</name>
    <value>local</value>
  </property>
   <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx512m</value>
  </property>
  <property>
    <name>mapred.job.tracker</name>
    <value>local</value>
  </property>

1 ACCEPTED SOLUTION

avatar
Guru

The Apache Hive Wiki https://cwiki.apache.org/confluence/display/Hive/GettingStarted has details on logging, including local mode and version differences.

I have pasted the key info below but see link above for more.

Hive Logging

Hive uses log4j for logging. By default logs are not emitted to the console by the CLI. The default logging level is WARN for Hive releases prior to 0.13.0. Starting with Hive 0.13.0, the default logging level is INFO.

The logs are stored in the directory /tmp/<user.name>:

  • /tmp/<user.name>/hive.log Note: In local mode, prior to Hive 0.13.0 the log file name was ".log" instead of "hive.log". This bug was fixed in release 0.13.0 (see HIVE-5528 and HIVE-5676).

To configure a different log location, set hive.log.dir in $HIVE_HOME/conf/hive-log4j.properties. Make sure the directory has the sticky bit set (chmod 1777 <dir>).

  • hive.log.dir=<other_location>

If the user wishes, the logs can be emitted to the console by adding the arguments shown below:

  • bin/hive --hiveconf hive.root.logger=INFO,console //for HiveCLI (deprecated)
  • bin/hiveserver2 --hiveconf hive.root.logger=INFO,console

Alternatively, the user can change the logging level only by using:

  • bin/hive --hiveconf hive.root.logger=INFO,DRFA //for HiveCLI (deprecated)
  • bin/hiveserver2 --hiveconf hive.root.logger=INFO,DRFA

Another option for logging is TimeBasedRollingPolicy (applicable for Hive 0.15.0 and above, HIVE-9001) by providing DAILY option as shown below:

  • bin/hive --hiveconf hive.root.logger=INFO,DAILY //for HiveCLI (deprecated)
  • bin/hiveserver2 --hiveconf hive.root.logger=INFO,DAILY

Note that setting hive.root.logger via the 'set' command does not change logging properties since they are determined at initialization time.

Hive also stores query logs on a per Hive session basis in /tmp/<user.name>/, but can be configured in hive-site.xml with the hive.querylog.location property.

Logging during Hive execution on a Hadoop cluster is controlled by Hadoop configuration. Usually Hadoop will produce one log file per map and reduce task stored on the cluster machine(s) where the task was executed. The log files can be obtained by clicking through to the Task Details page from the Hadoop JobTracker Web UI.

When using local mode (using mapreduce.framework.name=local), Hadoop/Hive execution logs are produced on the client machine itself. Starting with release 0.6 – Hive uses the hive-exec-log4j.properties (falling back to hive-log4j.properties only if it's missing) to determine where these logs are delivered by default. The default configuration file produces one log file per query executed in local mode and stores it under /tmp/<user.name>. The intent of providing a separate configuration file is to enable administrators to centralize execution log capture if desired (on a NFS file server for example). Execution logs are invaluable for debugging run-time errors.

View solution in original post

2 REPLIES 2

avatar
Guru

The Apache Hive Wiki https://cwiki.apache.org/confluence/display/Hive/GettingStarted has details on logging, including local mode and version differences.

I have pasted the key info below but see link above for more.

Hive Logging

Hive uses log4j for logging. By default logs are not emitted to the console by the CLI. The default logging level is WARN for Hive releases prior to 0.13.0. Starting with Hive 0.13.0, the default logging level is INFO.

The logs are stored in the directory /tmp/<user.name>:

  • /tmp/<user.name>/hive.log Note: In local mode, prior to Hive 0.13.0 the log file name was ".log" instead of "hive.log". This bug was fixed in release 0.13.0 (see HIVE-5528 and HIVE-5676).

To configure a different log location, set hive.log.dir in $HIVE_HOME/conf/hive-log4j.properties. Make sure the directory has the sticky bit set (chmod 1777 <dir>).

  • hive.log.dir=<other_location>

If the user wishes, the logs can be emitted to the console by adding the arguments shown below:

  • bin/hive --hiveconf hive.root.logger=INFO,console //for HiveCLI (deprecated)
  • bin/hiveserver2 --hiveconf hive.root.logger=INFO,console

Alternatively, the user can change the logging level only by using:

  • bin/hive --hiveconf hive.root.logger=INFO,DRFA //for HiveCLI (deprecated)
  • bin/hiveserver2 --hiveconf hive.root.logger=INFO,DRFA

Another option for logging is TimeBasedRollingPolicy (applicable for Hive 0.15.0 and above, HIVE-9001) by providing DAILY option as shown below:

  • bin/hive --hiveconf hive.root.logger=INFO,DAILY //for HiveCLI (deprecated)
  • bin/hiveserver2 --hiveconf hive.root.logger=INFO,DAILY

Note that setting hive.root.logger via the 'set' command does not change logging properties since they are determined at initialization time.

Hive also stores query logs on a per Hive session basis in /tmp/<user.name>/, but can be configured in hive-site.xml with the hive.querylog.location property.

Logging during Hive execution on a Hadoop cluster is controlled by Hadoop configuration. Usually Hadoop will produce one log file per map and reduce task stored on the cluster machine(s) where the task was executed. The log files can be obtained by clicking through to the Task Details page from the Hadoop JobTracker Web UI.

When using local mode (using mapreduce.framework.name=local), Hadoop/Hive execution logs are produced on the client machine itself. Starting with release 0.6 – Hive uses the hive-exec-log4j.properties (falling back to hive-log4j.properties only if it's missing) to determine where these logs are delivered by default. The default configuration file produces one log file per query executed in local mode and stores it under /tmp/<user.name>. The intent of providing a separate configuration file is to enable administrators to centralize execution log capture if desired (on a NFS file server for example). Execution logs are invaluable for debugging run-time errors.

avatar
Explorer

Thanks!!. That was what I was looking for.