About jolsen

jolsen · ‎11-18-2016

I dug through the Hive source code locating the source of the logging messages (e.g. "Job hasn't been submitted after 61s") I was seeing output to the console. From the code, I was able to locate a property "hive.spark.job.monitor.timeout" which defaults to 60s, just about the exact time my job timed out so figured must be the right property. I tried my job again, increasing "hive.spark.job.monitor.timeout" each time, and after increasing it to "180s" my job finally executed before before timing out. Problem solved. I have no idea why the my job should up to 3 minutes to actually execute which seems like an extremely long delay, but I'll leave that research for another time. This was my final code, which worked: set mapred.job.queue.name=root.apps10; set spark.master=yarn-client; set hive.server2.enable.doAs=false; set hive.execution.engine=spark; set spark.eventLog.enabled=true; set spark.shuffle.blockTransferService=nio; set spark.eventLog.dir=hdfs://HDFSNode:8020/user/spark/applicationHistory; set spark.shuffle.service.enabled=true; set spark.dynamicAllocation.enabled=true; set hive.spark.job.monitor.timeout=180s; DROP TABLE IF EXISTS testhiveonspark.temptable2; CREATE TABLE testhiveonspark.temptable2 STORED AS TEXTFILE AS SELECT num1, num2 FROM testhiveonspark.temptable1;

Online	Offline
Last Visited	‎11-18-2016 02:09 PM

Member Since	‎11-16-2016 04:36 AM
Last Visited	‎11-18-2016 02:09 PM
Posts	3

Cloudera Community

Re: Hive on Spark CTAS Fails with Straight SELECT ...

Re: Hive on Spark CTAS Fails with Straight SELECT ...