Created 09-01-2016 07:46 AM
I was able to successfully setup Hive 1.2.1 with Hadoop 2.7.1, using yarn as the mapreduce framework. For debugging certain queries (am trying to attach a debugger and go through the operators), now I'm trying to use the local mode. i.e.
After switching to the local mode, I could see queries running, but I can't locate the logs for those jobs. Checked in the log folder but can't find any folder (I was looking for a syslog put inside a folder specific to a particular job), or a log starting with job id. How can I get those logs?
These are all the properties I have in mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>local</value> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx512m</value> </property> <property> <name>mapred.job.tracker</name> <value>local</value> </property>
Created 09-01-2016 09:02 PM
The Apache Hive Wiki https://cwiki.apache.org/confluence/display/Hive/GettingStarted has details on logging, including local mode and version differences.
I have pasted the key info below but see link above for more.
Hive uses log4j for logging. By default logs are not emitted to the console by the CLI. The default logging level is WARN
for Hive releases prior to 0.13.0. Starting with Hive 0.13.0, the default logging level is INFO
.
The logs are stored in the directory /tmp/<user.name>
:
/tmp/<user.name>/hive.log
Note: In local mode, prior to Hive 0.13.0 the log file name was ".log
" instead of "hive.log
". This bug was fixed in release 0.13.0 (see HIVE-5528 and HIVE-5676).To configure a different log location, set hive.log.dir
in $HIVE_HOME/conf/hive-log4j.properties. Make sure the directory has the sticky bit set (chmod 1777 <dir>
).
hive.log.dir=<other_location>
If the user wishes, the logs can be emitted to the console by adding the arguments shown below:
bin/hive --hiveconf hive.root.logger=INFO,console //for HiveCLI (deprecated)
bin/hiveserver2 --hiveconf hive.root.logger=INFO,console
Alternatively, the user can change the logging level only by using:
bin/hive --hiveconf hive.root.logger=INFO,DRFA //for HiveCLI (deprecated)
bin/hiveserver2 --hiveconf hive.root.logger=INFO,DRFA
Another option for logging is TimeBasedRollingPolicy (applicable for Hive 0.15.0 and above, HIVE-9001) by providing DAILY option as shown below:
bin/hive --hiveconf hive.root.logger=INFO,DAILY //for HiveCLI (deprecated)
bin/hiveserver2 --hiveconf hive.root.logger=INFO,DAILY
Note that setting hive.root.logger
via the 'set' command does not change logging properties since they are determined at initialization time.
Hive also stores query logs on a per Hive session basis in /tmp/<user.name>/
, but can be configured in hive-site.xml with the hive.querylog.location
property.
Logging during Hive execution on a Hadoop cluster is controlled by Hadoop configuration. Usually Hadoop will produce one log file per map and reduce task stored on the cluster machine(s) where the task was executed. The log files can be obtained by clicking through to the Task Details page from the Hadoop JobTracker Web UI.
When using local mode (using mapreduce.framework.name=local
), Hadoop/Hive execution logs are produced on the client machine itself. Starting with release 0.6 – Hive uses the hive-exec-log4j.properties
(falling back to hive-log4j.properties
only if it's missing) to determine where these logs are delivered by default. The default configuration file produces one log file per query executed in local mode and stores it under /tmp/<user.name>
. The intent of providing a separate configuration file is to enable administrators to centralize execution log capture if desired (on a NFS file server for example). Execution logs are invaluable for debugging run-time errors.
Created 09-01-2016 09:02 PM
The Apache Hive Wiki https://cwiki.apache.org/confluence/display/Hive/GettingStarted has details on logging, including local mode and version differences.
I have pasted the key info below but see link above for more.
Hive uses log4j for logging. By default logs are not emitted to the console by the CLI. The default logging level is WARN
for Hive releases prior to 0.13.0. Starting with Hive 0.13.0, the default logging level is INFO
.
The logs are stored in the directory /tmp/<user.name>
:
/tmp/<user.name>/hive.log
Note: In local mode, prior to Hive 0.13.0 the log file name was ".log
" instead of "hive.log
". This bug was fixed in release 0.13.0 (see HIVE-5528 and HIVE-5676).To configure a different log location, set hive.log.dir
in $HIVE_HOME/conf/hive-log4j.properties. Make sure the directory has the sticky bit set (chmod 1777 <dir>
).
hive.log.dir=<other_location>
If the user wishes, the logs can be emitted to the console by adding the arguments shown below:
bin/hive --hiveconf hive.root.logger=INFO,console //for HiveCLI (deprecated)
bin/hiveserver2 --hiveconf hive.root.logger=INFO,console
Alternatively, the user can change the logging level only by using:
bin/hive --hiveconf hive.root.logger=INFO,DRFA //for HiveCLI (deprecated)
bin/hiveserver2 --hiveconf hive.root.logger=INFO,DRFA
Another option for logging is TimeBasedRollingPolicy (applicable for Hive 0.15.0 and above, HIVE-9001) by providing DAILY option as shown below:
bin/hive --hiveconf hive.root.logger=INFO,DAILY //for HiveCLI (deprecated)
bin/hiveserver2 --hiveconf hive.root.logger=INFO,DAILY
Note that setting hive.root.logger
via the 'set' command does not change logging properties since they are determined at initialization time.
Hive also stores query logs on a per Hive session basis in /tmp/<user.name>/
, but can be configured in hive-site.xml with the hive.querylog.location
property.
Logging during Hive execution on a Hadoop cluster is controlled by Hadoop configuration. Usually Hadoop will produce one log file per map and reduce task stored on the cluster machine(s) where the task was executed. The log files can be obtained by clicking through to the Task Details page from the Hadoop JobTracker Web UI.
When using local mode (using mapreduce.framework.name=local
), Hadoop/Hive execution logs are produced on the client machine itself. Starting with release 0.6 – Hive uses the hive-exec-log4j.properties
(falling back to hive-log4j.properties
only if it's missing) to determine where these logs are delivered by default. The default configuration file produces one log file per query executed in local mode and stores it under /tmp/<user.name>
. The intent of providing a separate configuration file is to enable administrators to centralize execution log capture if desired (on a NFS file server for example). Execution logs are invaluable for debugging run-time errors.
Created 09-04-2016 11:43 AM
Thanks!!. That was what I was looking for.