Created on 11-03-2015 12:43 AM - edited 09-16-2022 02:47 AM
Hi
I set up CDH 5.4.8 to AWS.
hive> set hive.auto.convert.join;
hive.auto.convert.join=true
Executing query:
select AVG(commute_time) AS avg_commute_time from transportation JOIN person p ON persons = id WHERE commute_time < 10 GROUP BY nationality;
Total jobs = 1
Execution log at: /tmp/demo/demo_20151103083131_9c7f8815-a9fd-42d2-b7a3-9028fcdd81fb.log
2015-11-03 08:31:49 Starting to launch local task to process map join; maximum memory = 1029701632
2015-11-03 08:31:51 Processing rows: 200000 Hashtable size: 199999 Memory usage: 118909728 percentage: 0.115
2015-11-03 08:31:51 Processing rows: 300000 Hashtable size: 299999 Memory usage: 160068072 percentage: 0.155
2015-11-03 08:31:51 Dump the side-table for tag: 1 with group count: 300000 into file: file:/tmp/demo/facf066b-778f-447e-86b5-a9152a74c41b/hive_2015-11-03_08-31-44_617_3475565247267897601-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile21--.hashtable
2015-11-03 08:31:51 Uploaded 1 File to: file:/tmp/demo/facf066b-778f-447e-86b5-a9152a74c41b/hive_2015-11-03_08-31-44_617_3475565247267897601-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile21--.hashtable (20655402 bytes)
2015-11-03 08:31:51 End of local task; Time Taken: 2.361 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1446191535269_0136, Tracking URL = http://[server]:8088/proxy/application_1446191535269_0136/
Kill Command = /opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/lib/hadoop/bin/hadoop job -kill job_1446191535269_0136
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2015-11-03 08:32:00,237 Stage-2 map = 0%, reduce = 0%
2015-11-03 08:32:11,603 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 10.73 sec
2015-11-03 08:32:18,828 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 12.44 sec
MapReduce Total cumulative CPU time: 12 seconds 440 msec
Ended Job = job_1446191535269_0136
MapReduce Jobs Launched:
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 12.44 sec HDFS Read: 64673839 HDFS Write: 84 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 440 msec
OK
9.22561984510033
6.97536844275076
3.4043091344593
8.97108984313809
9.47274416580304
Time taken: 36.344 seconds, Fetched: 5 row(s)
When I execute the same query via beeline:
0: jdbc:hive2://localhost:10000> select AVG(commute_time) AS avg_commute_time from transportation JOIN person p ON persons = id WHERE commute_time < 10 GROUP BY nationality;
ERROR : Execution failed with exit status: 3
ERROR : Obtaining error information
ERROR :
Task failed!
Task ID:
Stage-5
Logs:
ERROR : /var/log/hive/hadoop-cmf-hive-HIVESERVER2-[server].log.out
Getting log thread is interrupted, since query is done!
Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask (state=08S01,code=3)
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:279)
at org.apache.hive.beeline.Commands.execute(Commands.java:814)
at org.apache.hive.beeline.Commands.sql(Commands.java:670)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
0: jdbc:hive2://localhost:10000>
But when I turn hive.auto.convert.join to false it works:
0: jdbc:hive2://localhost:10000> set hive.auto.convert.join=false;
0: jdbc:hive2://localhost:10000> select AVG(commute_time) AS avg_commute_time from transportation JOIN person p ON persons = id WHERE commute_time < 10 GROUP BY nationality;
INFO : Number of reduce tasks not specified. Estimated from input data size: 2
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO : number of splits:2
INFO : Submitting tokens for job: job_1446191535269_0137
INFO : The url to track the job: http://[ip]:8088/proxy/application_1446191535269_0137/
INFO : Starting Job = job_1446191535269_0137, Tracking URL = http://[ip]:8088/proxy/application_1446191535269_0137/
INFO : Kill Command = /opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/lib/hadoop/bin/hadoop job -kill job_1446191535269_0137
INFO : Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 2
INFO : 2015-11-03 08:37:40,662 Stage-1 map = 0%, reduce = 0%
INFO : 2015-11-03 08:37:48,910 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 4.66 sec
INFO : 2015-11-03 08:37:49,941 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 9.75 sec
INFO : 2015-11-03 08:37:58,192 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 16.76 sec
INFO : MapReduce Total cumulative CPU time: 16 seconds 760 msec
INFO : Ended Job = job_1446191535269_0137
INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO : number of splits:1
INFO : Submitting tokens for job: job_1446191535269_0138
INFO : The url to track the job: http://[ip]:8088/proxy/application_1446191535269_0138/
INFO : Starting Job = job_1446191535269_0138, Tracking URL = http://[ip]:8088/proxy/application_1446191535269_0138/
INFO : Kill Command = /opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/lib/hadoop/bin/hadoop job -kill job_1446191535269_0138
INFO : Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
INFO : 2015-11-03 08:38:05,525 Stage-2 map = 0%, reduce = 0%
INFO : 2015-11-03 08:38:12,738 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
INFO : 2015-11-03 08:38:18,932 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 2.87 sec
INFO : MapReduce Total cumulative CPU time: 2 seconds 870 msec
INFO : Ended Job = job_1446191535269_0138
Getting log thread is interrupted, since query is done!
+--------------------------------+--+
| avg_commute_time |
+--------------------------------+--+
| 9.22561984510033 |
| 6.97536844275076 |
| 3.4043091344593 |
| 8.97108984313809 |
| 9.47274416580304 |
+--------------------------------+--+
5 rows selected (46.722 seconds)
0: jdbc:hive2://localhost:10000>
How to explain that? I'd like to set hive.auto.convert.join=true because users who make queries via HUE do not know what is map side joins.
Best regards, Margus (margusja) Roo
Created on 11-04-2015 06:15 AM - edited 11-04-2015 07:21 AM
Hi,
Error related to "MapredLocalTask"+"hive.auto.convert.join" has a lot to do with the JVM Heap Size.
If you run the query against the hive cli, you should try to increase the JVM Heap size of the HIVE Gateway.
If you run the query against HUE (or using a JDBC connection), you should try to increase the JVM Heap size of the HiveServer2.
Good luck.
Created on 11-04-2015 06:15 AM - edited 11-04-2015 07:21 AM
Hi,
Error related to "MapredLocalTask"+"hive.auto.convert.join" has a lot to do with the JVM Heap Size.
If you run the query against the hive cli, you should try to increase the JVM Heap size of the HIVE Gateway.
If you run the query against HUE (or using a JDBC connection), you should try to increase the JVM Heap size of the HiveServer2.
Good luck.
Created 05-18-2016 11:18 AM
Greetings
I am having the same problem I increased the heapsize as requested and my Hive/MR job fails
hive> set mapreduce.map.memory.mb=2048;
hive> set mapreduce.reduce.memory.mb=4096;
hive> select count(distinct warctype) from commoncrawl18 where warctype='warcinfo';
Query ID = jmill383_20160518141345_91d2a202-049e-4546-a9f7-e7183f2ff4bf
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1463594979064_0001, Tracking URL = http://starchild:8088/proxy/application_1463594979064_0001/
Kill Command = /opt/hadoop/bin/hadoop job -kill job_1463594979064_0001
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2016-05-18 14:13:52,640 Stage-1 map = 0%, reduce = 0%
Ended Job = job_1463594979064_0001 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://starchild:8088/cluster/app/application_1463594979064_0001
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
Is there another alternative remedy for this? Please advise
John M
Created 05-19-2016 12:41 AM
I'm not seeing the same issue here.
Check the yarn application logs. It will surely contain information about the issue.