Support Questions

margusja · ‎11-03-2015

Hi

I set up CDH 5.4.8 to AWS.

hive> set hive.auto.convert.join;
hive.auto.convert.join=true

Executing query:

select AVG(commute_time) AS avg_commute_time from transportation JOIN person p ON persons = id WHERE commute_time < 10 GROUP BY nationality;

Total jobs = 1
Execution log at: /tmp/demo/demo_20151103083131_9c7f8815-a9fd-42d2-b7a3-9028fcdd81fb.log
2015-11-03 08:31:49 Starting to launch local task to process map join; maximum memory = 1029701632
2015-11-03 08:31:51 Processing rows: 200000 Hashtable size: 199999 Memory usage: 118909728 percentage: 0.115
2015-11-03 08:31:51 Processing rows: 300000 Hashtable size: 299999 Memory usage: 160068072 percentage: 0.155
2015-11-03 08:31:51 Dump the side-table for tag: 1 with group count: 300000 into file: file:/tmp/demo/facf066b-778f-447e-86b5-a9152a74c41b/hive_2015-11-03_08-31-44_617_3475565247267897601-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile21--.hashtable
2015-11-03 08:31:51 Uploaded 1 File to: file:/tmp/demo/facf066b-778f-447e-86b5-a9152a74c41b/hive_2015-11-03_08-31-44_617_3475565247267897601-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile21--.hashtable (20655402 bytes)
2015-11-03 08:31:51 End of local task; Time Taken: 2.361 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1446191535269_0136, Tracking URL = http://[server]:8088/proxy/application_1446191535269_0136/
Kill Command = /opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/lib/hadoop/bin/hadoop job -kill job_1446191535269_0136
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2015-11-03 08:32:00,237 Stage-2 map = 0%, reduce = 0%
2015-11-03 08:32:11,603 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 10.73 sec
2015-11-03 08:32:18,828 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 12.44 sec
MapReduce Total cumulative CPU time: 12 seconds 440 msec
Ended Job = job_1446191535269_0136
MapReduce Jobs Launched:
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 12.44 sec HDFS Read: 64673839 HDFS Write: 84 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 440 msec
OK
9.22561984510033
6.97536844275076
3.4043091344593
8.97108984313809
9.47274416580304
Time taken: 36.344 seconds, Fetched: 5 row(s)

When I execute the same query via beeline:

0: jdbc:hive2://localhost:10000> select AVG(commute_time) AS avg_commute_time from transportation JOIN person p ON persons = id WHERE commute_time < 10 GROUP BY nationality;
ERROR : Execution failed with exit status: 3
ERROR : Obtaining error information
ERROR :
Task failed!
Task ID:
Stage-5

Logs:

ERROR : /var/log/hive/hadoop-cmf-hive-HIVESERVER2-[server].log.out
Getting log thread is interrupted, since query is done!
Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask (state=08S01,code=3)
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:279)
at org.apache.hive.beeline.Commands.execute(Commands.java:814)
at org.apache.hive.beeline.Commands.sql(Commands.java:670)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
0: jdbc:hive2://localhost:10000>

But when I turn hive.auto.convert.join to false it works:

0: jdbc:hive2://localhost:10000> set hive.auto.convert.join=false;

0: jdbc:hive2://localhost:10000> select AVG(commute_time) AS avg_commute_time from transportation JOIN person p ON persons = id WHERE commute_time < 10 GROUP BY nationality;
INFO : Number of reduce tasks not specified. Estimated from input data size: 2
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO : number of splits:2
INFO : Submitting tokens for job: job_1446191535269_0137
INFO : The url to track the job: http://[ip]:8088/proxy/application_1446191535269_0137/
INFO : Starting Job = job_1446191535269_0137, Tracking URL = http://[ip]:8088/proxy/application_1446191535269_0137/
INFO : Kill Command = /opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/lib/hadoop/bin/hadoop job -kill job_1446191535269_0137
INFO : Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 2
INFO : 2015-11-03 08:37:40,662 Stage-1 map = 0%, reduce = 0%
INFO : 2015-11-03 08:37:48,910 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 4.66 sec
INFO : 2015-11-03 08:37:49,941 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 9.75 sec
INFO : 2015-11-03 08:37:58,192 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 16.76 sec
INFO : MapReduce Total cumulative CPU time: 16 seconds 760 msec
INFO : Ended Job = job_1446191535269_0137
INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO : number of splits:1
INFO : Submitting tokens for job: job_1446191535269_0138
INFO : The url to track the job: http://[ip]:8088/proxy/application_1446191535269_0138/
INFO : Starting Job = job_1446191535269_0138, Tracking URL = http://[ip]:8088/proxy/application_1446191535269_0138/
INFO : Kill Command = /opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/lib/hadoop/bin/hadoop job -kill job_1446191535269_0138
INFO : Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
INFO : 2015-11-03 08:38:05,525 Stage-2 map = 0%, reduce = 0%
INFO : 2015-11-03 08:38:12,738 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
INFO : 2015-11-03 08:38:18,932 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 2.87 sec
INFO : MapReduce Total cumulative CPU time: 2 seconds 870 msec
INFO : Ended Job = job_1446191535269_0138
Getting log thread is interrupted, since query is done!
+--------------------------------+--+
| avg_commute_time |
+--------------------------------+--+
| 9.22561984510033 |
| 6.97536844275076 |
| 3.4043091344593 |
| 8.97108984313809 |
| 9.47274416580304 |
+--------------------------------+--+
5 rows selected (46.722 seconds)
0: jdbc:hive2://localhost:10000>

How to explain that? I'd like to set hive.auto.convert.join=true because users who make queries via HUE do not know what is map side joins.

Best regards, Margus (margusja) Roo

Cloudera Community

Support Questions

Who agreed with this topic

hive.auto.convert.join Execution Error, return code 3 from MapredLocalTask (state=08S01,code=3)