Member since
11-16-2016
3
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
8253 | 11-18-2016 12:33 PM |
11-18-2016
12:33 PM
I dug through the Hive source code locating the source of the logging messages (e.g. "Job hasn't been submitted after 61s") I was seeing output to the console. From the code, I was able to locate a property "hive.spark.job.monitor.timeout" which defaults to 60s, just about the exact time my job timed out so figured must be the right property. I tried my job again, increasing "hive.spark.job.monitor.timeout" each time, and after increasing it to "180s" my job finally executed before before timing out. Problem solved. I have no idea why the my job should up to 3 minutes to actually execute which seems like an extremely long delay, but I'll leave that research for another time. This was my final code, which worked: set mapred.job.queue.name=root.apps10;
set spark.master=yarn-client;
set hive.server2.enable.doAs=false;
set hive.execution.engine=spark;
set spark.eventLog.enabled=true;
set spark.shuffle.blockTransferService=nio;
set spark.eventLog.dir=hdfs://HDFSNode:8020/user/spark/applicationHistory;
set spark.shuffle.service.enabled=true;
set spark.dynamicAllocation.enabled=true;
set hive.spark.job.monitor.timeout=180s;
DROP TABLE IF EXISTS testhiveonspark.temptable2;
CREATE TABLE testhiveonspark.temptable2
STORED AS TEXTFILE
AS SELECT num1, num2 FROM testhiveonspark.temptable1;
... View more