About jolsen

jolsen · ‎11-18-2016

I dug through the Hive source code locating the source of the logging messages (e.g. "Job hasn't been submitted after 61s") I was seeing output to the console. From the code, I was able to locate a property "hive.spark.job.monitor.timeout" which defaults to 60s, just about the exact time my job timed out so figured must be the right property. I tried my job again, increasing "hive.spark.job.monitor.timeout" each time, and after increasing it to "180s" my job finally executed before before timing out. Problem solved. I have no idea why the my job should up to 3 minutes to actually execute which seems like an extremely long delay, but I'll leave that research for another time. This was my final code, which worked: set mapred.job.queue.name=root.apps10; set spark.master=yarn-client; set hive.server2.enable.doAs=false; set hive.execution.engine=spark; set spark.eventLog.enabled=true; set spark.shuffle.blockTransferService=nio; set spark.eventLog.dir=hdfs://HDFSNode:8020/user/spark/applicationHistory; set spark.shuffle.service.enabled=true; set spark.dynamicAllocation.enabled=true; set hive.spark.job.monitor.timeout=180s; DROP TABLE IF EXISTS testhiveonspark.temptable2; CREATE TABLE testhiveonspark.temptable2 STORED AS TEXTFILE AS SELECT num1, num2 FROM testhiveonspark.temptable1;

jolsen · ‎11-18-2016

I ran my both my working and not working tests in debug like "hive --hiveconf hive.root.logger=DEBUG,console -f testhiveonspark_not_working.hql" and compared the outputs. From the delta, I tried to select what I thought was the most relevant parts from the log, which I pasted below. I'm still stuggling to understand why "Job hasn't been submitted after 61s", and the source of this problem. Any help? Thank you. ... 16/11/12 11:42:56 [main]: INFO client.SparkClientImpl: Running client driver with argv: /usr/lib/spark/bin/spark-submit --properties-file /tmp/spark-submit.5047982490856668419.properties --class org.apache.hive.spark.client.RemoteDriver /usr/lib/hive/lib/hive-exec-1.1.0-cdh5.9.0.jar --remote-host HDFSNode --remote-port 42944 --conf hive.spark.client.connect.timeout=1000 --conf hive.spark.client.server.connect.timeout=90000 --conf hive.spark.client.channel.log.level=null --conf hive.spark.client.rpc.max.size=52428800 --conf hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256 --conf hive.spark.client.rpc.server.address=null 16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000 16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8 16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000 16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256 16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800 16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Class path contains multiple SLF4J bindings. 16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Found binding in [jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:57 INFO RemoteDriver: Connecting to: HDFSNode:42944 16/11/12 11:42:58 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:58 WARN Rpc: Invalid log level null, reverting to default. 16/11/12 11:42:58 [RPC-Handler-3]: DEBUG util.ResourceLeakDetector: -Dio.netty.leakDetectionLevel: simple 16/11/12 11:42:58 [RPC-Handler-3]: DEBUG util.Recycler: -Dio.netty.recycler.maxCapacity.default: 262144 16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (41 bytes) 16/11/12 11:42:58 [RPC-Handler-3]: DEBUG internal.Cleaner0: java.nio.ByteBuffer.cleaner(): available 16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Handling SASL challenge message... 16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Sending SASL challenge response... 16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (98 bytes) 16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (275 bytes) 16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Handling SASL challenge message... 16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Sending SASL challenge response... 16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (45 bytes) 16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: SASL negotiation finished with QOP auth. 16/11/12 11:42:58 [main]: DEBUG session.SparkSessionManagerImpl: New session (23ef000b-8281-4c2d-a1db-077a3d40366f) is created. 16/11/12 11:42:58 [main]: INFO log.PerfLogger: <PERFLOG method=SparkSubmitJob from=org.apache.hadoop.hive.ql.exec.spark.SparkTask> 16/11/12 11:42:58 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:58 INFO SparkContext: Running Spark version 1.6.0 16/11/12 11:42:58 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 0 (org.apache.hive.spark.client.BaseProtocol$SyncJobRequest). 16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes) 16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.BaseProtocol$SyncJobRequest (168 bytes) 16/11/12 11:42:58 [main]: DEBUG hdfs.DFSClient: /tmp/hive/jolsen/c1dcb42a-8312-42df-a3c2-c1c359e10011/hive_2016-11-12_11-42-54_731_1054170393347011832-1: masked=rwx------ 16/11/12 11:42:58 [IPC Parameter Sending Thread #0]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #22 16/11/12 11:42:58 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #22 16/11/12 11:42:58 [main]: DEBUG ipc.ProtobufRpcEngine: Call: mkdirs took 9ms 16/11/12 11:42:58 [IPC Parameter Sending Thread #0]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #23 16/11/12 11:42:58 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #23 16/11/12 11:42:58 [main]: DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms 16/11/12 11:42:58 [main]: INFO ql.Context: New scratch dir is hdfs://HDFSNode:8020/tmp/hive/jolsen/c1dcb42a-8312-42df-a3c2-c1c359e10011/hive_2016-11-12_11-42-54_731_1054170393347011832-1 16/11/12 11:42:58 [main]: DEBUG hdfs.DFSClient: /tmp/hive/jolsen/c1dcb42a-8312-42df-a3c2-c1c359e10011/hive_2016-11-12_11-42-54_731_1054170393347011832-1/-mr-10003: masked=rwxr-xr-x 16/11/12 11:42:58 [IPC Parameter Sending Thread #0]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #24 16/11/12 11:42:58 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #24 16/11/12 11:42:58 [main]: DEBUG ipc.ProtobufRpcEngine: Call: mkdirs took 5ms 16/11/12 11:42:58 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 1 (org.apache.hive.spark.client.BaseProtocol$JobRequest). 16/11/12 11:42:58 [main]: DEBUG client.SparkClientImpl: Send JobRequest[3e50626e-3a1e-47d0-a18c-76869e1277ae]. 16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes) 16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.BaseProtocol$JobRequest (174708 bytes) 16/11/12 11:42:58 [main]: INFO log.PerfLogger: </PERFLOG method=SparkSubmitJob start=1478968978795 end=1478968978987 duration=192 from=org.apache.hadoop.hive.ql.exec.spark.SparkTask> Starting Spark Job = 3e50626e-3a1e-47d0-a18c-76869e1277ae 16/11/12 11:42:58 [main]: INFO exec.Task: Starting Spark Job = 3e50626e-3a1e-47d0-a18c-76869e1277ae 16/11/12 11:42:58 [main]: INFO log.PerfLogger: <PERFLOG method=SparkRunJob from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor> 16/11/12 11:42:58 [main]: INFO log.PerfLogger: <PERFLOG method=SparkSubmitToRunning from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor> state = SENT 16/11/12 11:42:58 [main]: INFO status.SparkJobMonitor: state = SENT 16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SecurityManager: Changing view acls to: jolsen 16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SecurityManager: Changing modify acls to: jolsen 16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jolsen); users with modify permissions: Set(jolsen) 16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Utils: Successfully started service 'sparkDriver' on port 43254. 16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Slf4jLogger: Slf4jLogger started 16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Remoting: Starting remoting 16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.1.99:53727] 16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@192.168.1.99:53727] 16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 53727. 16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SparkEnv: Registering MapOutputTracker 16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SparkEnv: Registering BlockManagerMaster 16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-1353a7a8-14f0-4b9d-b369-1224fbc9b331 16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO MemoryStore: MemoryStore started with capacity 530.3 MB state = SENT 16/11/12 11:42:59 [main]: INFO status.SparkJobMonitor: state = SENT 16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO SparkEnv: Registering OutputCommitCoordinator 16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Utils: Successfully started service 'SparkUI' on port 4040. 16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO SparkUI: Started SparkUI at http://192.168.1.99:4040 16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO SparkContext: Added JAR file:/usr/lib/hive/lib/hive-exec-1.1.0-cdh5.9.0.jar at spark://192.168.1.99:43254/jars/hive-exec-1.1.0-cdh5.9.0.jar with timestamp 1478968980504 16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO RMProxy: Connecting to ResourceManager at HDFSNode/192.168.1.99:8032 16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Requesting a new application from cluster with 10 NodeManagers state = SENT 16/11/12 11:43:00 [main]: INFO status.SparkJobMonitor: state = SENT 16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (20480 MB per container) 16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Setting up container launch context for our AM 16/11/12 11:43:01 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:01 INFO Client: Setting up the launch environment for our AM container 16/11/12 11:43:01 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:01 INFO Client: Preparing resources for our AM container 16/11/12 11:43:01 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:01 INFO Client: Uploading resource file:/usr/lib/spark/lib/spark-assembly-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5.9.0.jar -> hdfs://HDFSNode:8020/user/jolsen/.sparkStaging/application_1478903884683_0032/spark-assembly-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5.9.0.jar state = SENT 16/11/12 11:43:01 [main]: INFO status.SparkJobMonitor: state = SENT 16/11/12 11:43:02 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:02 INFO Client: Uploading resource file:/tmp/spark-695bd22c-574c-41ff-9e38-fa491f09c3f6/__spark_conf__2574250824979235523.zip -> hdfs://HDFSNode:8020/user/jolsen/.sparkStaging/application_1478903884683_0032/__spark_conf__2574250824979235523.zip state = SENT 16/11/12 11:43:02 [main]: INFO status.SparkJobMonitor: state = SENT 16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO SecurityManager: Changing view acls to: jolsen 16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO SecurityManager: Changing modify acls to: jolsen 16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jolsen); users with modify permissions: Set(jolsen) 16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO Client: Submitting application 32 to ResourceManager 16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO YarnClientImpl: Submitted application application_1478903884683_0032 state = SENT 16/11/12 11:43:04 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:05 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:06 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:07 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:08 [main]: INFO status.SparkJobMonitor: state = SENT 16/11/12 11:43:08 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms 16/11/12 11:43:08 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen: closed 16/11/12 11:43:08 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen: stopped, remaining connections 0 state = SENT 16/11/12 11:43:09 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:10 [main]: INFO status.SparkJobMonitor: state = SENT 16/11/12 11:43:10 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:10 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null) 16/11/12 11:43:10 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:10 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> HDFSNode, PROXY_URI_BASES -> http://HDFSNode:8151/proxy/application_1478903884683_0032), /proxy/application_1478903884683_0032 16/11/12 11:43:10 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:10 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter state = SENT 16/11/12 11:43:11 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:12 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:13 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:14 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:15 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:16 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:17 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:18 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:19 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:20 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:21 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:22 [main]: INFO status.SparkJobMonitor: state = SENT 16/11/12 11:43:22 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms state = SENT 16/11/12 11:43:23 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:24 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:25 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:26 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:27 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:28 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:29 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:30 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:31 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:32 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:33 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:34 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:35 [main]: INFO status.SparkJobMonitor: state = SENT 16/11/12 11:43:35 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms state = SENT 16/11/12 11:43:36 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:37 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:38 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:39 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:40 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:41 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:42 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:43 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:44 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:45 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:46 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:47 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:48 [main]: INFO status.SparkJobMonitor: state = SENT 16/11/12 11:43:48 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms state = SENT 16/11/12 11:43:49 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:50 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:51 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:52 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:53 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:54 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:55 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:56 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:57 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:58 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:43:59 [main]: INFO status.SparkJobMonitor: state = SENT state = SENT 16/11/12 11:44:00 [main]: INFO status.SparkJobMonitor: state = SENT Job hasn't been submitted after 61s. Aborting it. Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc. Please check YARN or Spark driver's logs for further information. 16/11/12 11:44:00 [main]: ERROR status.SparkJobMonitor: Job hasn't been submitted after 61s. Aborting it. Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc. Please check YARN or Spark driver's logs for further information. Status: SENT 16/11/12 11:44:00 [main]: ERROR status.SparkJobMonitor: Status: SENT 16/11/12 11:44:00 [main]: INFO log.PerfLogger: </PERFLOG method=SparkRunJob start=1478968978989 end=1478969040067 duration=61078 from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor> 16/11/12 11:44:00 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 2 (org.apache.hive.spark.client.BaseProtocol$CancelJob). 16/11/12 11:44:00 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes) 16/11/12 11:44:00 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.BaseProtocol$CancelJob (91 bytes) 16/11/12 11:44:00 [main]: DEBUG ipc.Client: The ping interval is 60000 ms. 16/11/12 11:44:00 [main]: DEBUG ipc.Client: Connecting to HDFSNode/192.168.1.99:8020 16/11/12 11:44:00 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen: starting, having connections 1 16/11/12 11:44:00 [IPC Parameter Sending Thread #1]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #25 16/11/12 11:44:00 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #25 16/11/12 11:44:00 [main]: DEBUG ipc.ProtobufRpcEngine: Call: delete took 34ms 16/11/12 11:44:00 [IPC Parameter Sending Thread #1]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #26 16/11/12 11:44:00 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #26 16/11/12 11:44:00 [main]: DEBUG ipc.ProtobufRpcEngine: Call: delete took 1ms FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask 16/11/12 11:44:00 [main]: ERROR ql.Driver: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask 16/11/12 11:44:00 [main]: INFO log.PerfLogger: </PERFLOG method=Driver.execute start=1478968975448 end=1478969040127 duration=64679 from=org.apache.hadoop.hive.ql.Driver> 16/11/12 11:44:00 [main]: INFO metadata.Hive: Dumping metastore api call timing information for : execution phase 16/11/12 11:44:00 [main]: DEBUG metadata.Hive: Total time spent in each metastore function (ms): {isCompatibleWith_(HiveConf, )=1} 16/11/12 11:44:00 [main]: INFO ql.Driver: Completed executing command(queryId=jolsen_20161112114242_6a1682d9-cc1a-4e6b-904e-03baf361de34); Time taken: 64.679 seconds 16/11/12 11:44:00 [main]: INFO log.PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver> 16/11/12 11:44:00 [main]: DEBUG ZooKeeperHiveLockManager: About to release lock for testhiveonspark/temptable1 16/11/12 11:44:00 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x5585583fa3e000e, packet:: clientPath:null serverPath:null finished:false header:: 20,2 replyHeader:: 20,176093659822,0 request:: '/hive_zookeeper_namespace/testhiveonspark/temptable1/LOCK-SHARED-0000000000,-1 response:: null 16/11/12 11:44:00 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 2ms 16/11/12 11:44:00 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x5585583fa3e000e, packet:: clientPath:null serverPath:null finished:false header:: 21,12 replyHeader:: 21,176093659822,0 request:: '/hive_zookeeper_namespace/testhiveonspark/temptable1,F response:: v{},s{176093659820,176093659820,1478968975439,1478968975439,0,2,0,0,0,0,176093659822} 16/11/12 11:44:00 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x5585583fa3e000e, packet:: clientPath:null serverPath:null finished:false header:: 22,2 replyHeader:: 22,176093659823,0 request:: '/hive_zookeeper_namespace/testhiveonspark/temptable1,-1 response:: null 16/11/12 11:44:00 [main]: DEBUG ZooKeeperHiveLockManager: About to release lock for testhiveonspark 16/11/12 11:44:00 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x5585583fa3e000e, packet:: clientPath:null serverPath:null finished:false header:: 23,2 replyHeader:: 23,176093659824,0 request:: '/hive_zookeeper_namespace/testhiveonspark/LOCK-SHARED-0000000000,-1 response:: null 16/11/12 11:44:00 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x5585583fa3e000e, packet:: clientPath:null serverPath:null finished:false header:: 24,12 replyHeader:: 24,176093659824,0 request:: '/hive_zookeeper_namespace/testhiveonspark,F response:: v{},s{176093659818,176093659818,1478968975429,1478968975429,0,4,0,0,0,0,176093659824} 16/11/12 11:44:00 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x5585583fa3e000e, packet:: clientPath:null serverPath:null finished:false header:: 25,2 replyHeader:: 25,176093659825,0 request:: '/hive_zookeeper_namespace/testhiveonspark,-1 response:: null 16/11/12 11:44:00 [main]: INFO log.PerfLogger: </PERFLOG method=releaseLocks start=1478969040128 end=1478969040142 duration=14 from=org.apache.hadoop.hive.ql.Driver> 16/11/12 11:44:00 [IPC Parameter Sending Thread #1]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #27 16/11/12 11:44:00 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #27 16/11/12 11:44:00 [main]: DEBUG ipc.ProtobufRpcEngine: Call: delete took 11ms 16/11/12 11:44:00 [IPC Parameter Sending Thread #1]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #28 16/11/12 11:44:00 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #28 16/11/12 11:44:00 [main]: DEBUG ipc.ProtobufRpcEngine: Call: delete took 11ms 16/11/12 11:44:00 [main]: DEBUG session.SparkSessionManagerImpl: Closing session (23ef000b-8281-4c2d-a1db-077a3d40366f). 16/11/12 11:44:00 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 3 (org.apache.hive.spark.client.BaseProtocol$EndSession). 16/11/12 11:44:00 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes) 16/11/12 11:44:00 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.BaseProtocol$EndSession (55 bytes) 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO Client: Application report for application_1478903884683_0032 (state: RUNNING) 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO Client: 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: client token: N/A 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: diagnostics: N/A 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: ApplicationMaster host: 192.168.1.151 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: ApplicationMaster RPC port: 0 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: queue: root.apps10 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: start time: 1478968983665 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: final status: UNDEFINED 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: tracking URL: http://HDFSNode:8151/proxy/application_1478903884683_0032/ 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: user: jolsen 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO YarnClientSchedulerBackend: Application application_1478903884683_0032 has started running. 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36149. 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO NettyBlockTransferService: Server created on 36149 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO BlockManager: external shuffle service port = 7337 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO BlockManagerMaster: Trying to register BlockManager 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.99:36149 with 530.3 MB RAM, BlockManagerId(driver, 192.168.1.99, 36149) 16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO BlockManagerMaster: Registered BlockManager 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO EventLoggingListener: Logging events to hdfs://HDFSNode:8020/user/spark/applicationHistory/application_1478903884683_0032 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 ERROR SparkContext: Error adding jar (java.lang.IllegalArgumentException: requirement failed: JAR hive-exec-1.1.0-cdh5.9.0.jar already registered.), was the --addJars option used? 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO RpcDispatcher: [DriverProtocol] Closing channel due to exception in pipeline (Connection reset by peer). 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 WARN RemoteDriver: Shutting down driver because RPC channel was closed. 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO RemoteDriver: Shutting down remote driver. 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO SparkUI: Stopped Spark web UI at http://192.168.1.99:4040 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO YarnClientSchedulerBackend: Shutting down all executors 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO YarnClientSchedulerBackend: Interrupting monitor thread 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO YarnClientSchedulerBackend: Asking each executor to shut down 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO YarnClientSchedulerBackend: Stopped 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO MemoryStore: MemoryStore cleared 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO BlockManager: BlockManager stopped 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO BlockManagerMaster: BlockManagerMaster stopped 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO SparkContext: Successfully stopped SparkContext 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO ShutdownHookManager: Shutdown hook called 16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-695bd22c-574c-41ff-9e38-fa491f09c3f6 16/11/12 11:44:04 [IPC Parameter Sending Thread #1]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #29 16/11/12 11:44:04 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #29 16/11/12 11:44:04 [main]: DEBUG ipc.ProtobufRpcEngine: Call: delete took 6ms 16/11/12 11:44:04 [main]: DEBUG metadata.Hive: Closing current thread's connection to Hive Metastore. 16/11/12 11:44:04 [main]: INFO hive.metastore: Closed a connection to metastore, current connections: 0 16/11/12 11:44:04 [Thread-10]: INFO session.SparkSessionManagerImpl: Closing the session manager. 16/11/12 11:44:04 [Thread-7]: DEBUG imps.CuratorFrameworkImpl: Closing 16/11/12 11:44:04 [Thread-7]: DEBUG curator.CuratorZookeeperClient: Closing 16/11/12 11:44:04 [Thread-7]: DEBUG curator.ConnectionState: Closing 16/11/12 11:44:04 [Thread-7]: DEBUG zookeeper.ZooKeeper: Closing session: 0x5585583fa3e000e 16/11/12 11:44:04 [Thread-7]: DEBUG zookeeper.ClientCnxn: Closing client for session: 0x5585583fa3e000e 16/11/12 11:44:04 [IPC Parameter Sending Thread #1]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #30 16/11/12 11:44:04 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #30 16/11/12 11:44:04 [Thread-0]: DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms 16/11/12 11:44:04 [IPC Parameter Sending Thread #1]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #31 16/11/12 11:44:04 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #31 16/11/12 11:44:04 [Thread-0]: DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms 16/11/12 11:44:04 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x5585583fa3e000e, packet:: clientPath:null serverPath:null finished:false header:: 26,-11 replyHeader:: 26,176093659826,0 request:: null response:: null 16/11/12 11:44:04 [Thread-7]: DEBUG zookeeper.ClientCnxn: Disconnecting client for session: 0x5585583fa3e000e 16/11/12 11:44:04 [Thread-7]: INFO zookeeper.ZooKeeper: Session: 0x5585583fa3e000e closed 16/11/12 11:44:04 [main-EventThread]: INFO zookeeper.ClientCnxn: EventThread shut down 16/11/12 11:44:04 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: An exception was thrown while closing send thread for session 0x5585583fa3e000e : Unable to read additional data from server sessionid 0x5585583fa3e000e, likely server has closed socket 16/11/12 11:44:04 [Thread-7]: INFO CuratorFrameworkSingleton: Closing ZooKeeper client. 16/11/12 11:44:04 [Thread-0]: DEBUG ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@513c46e 16/11/12 11:44:04 [Thread-0]: DEBUG ipc.Client: removing client from cache: org.apache.hadoop.ipc.Client@513c46e 16/11/12 11:44:04 [Thread-0]: DEBUG ipc.Client: stopping actual client because no more references remain: org.apache.hadoop.ipc.Client@513c46e 16/11/12 11:44:04 [Thread-0]: DEBUG ipc.Client: Stopping client 16/11/12 11:44:04 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen: closed 16/11/12 11:44:04 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen: stopped, remaining connections 0

jolsen · ‎11-16-2016

I have used Hive on Spark successfully for several weeks executing CTAS statements that included aggregated data. Recently, I needed to execute a CTAS with a straight select statement (no GROUP BY) and was surprised that it failed. I found I could execute the statement with Hive on MR just fine. I have created a test that isolates my issue. Create source table: hive -e " CREATE DATABASE IF NOT EXISTS testhiveonspark; DROP TABLE IF EXISTS testhiveonspark.temptable1; CREATE TABLE testhiveonspark.temptable1 (num1 INT, num2 INT) STORED AS TEXTFILE; INSERT into table testhiveonspark.temptable1 values (1, 1); select * from testhiveonspark.temptable1; " Execute a working CTAS file (using SELECT GROUP BY)...this completes just fine: cat > testhiveonspark_working.hql << EOF set mapred.job.queue.name=root.apps10; set spark.master=yarn-client; set hive.server2.enable.doAs=false; set hive.execution.engine=spark; set spark.eventLog.enabled=true; set spark.shuffle.blockTransferService=nio; set spark.eventLog.dir=hdfs://hdfsnode:8020/user/spark/applicationHistory; set spark.shuffle.service.enabled=true; set spark.dynamicAllocation.enabled=true; DROP TABLE IF EXISTS testhiveonspark.temptable2; CREATE TABLE testhiveonspark.temptable2 STORED AS TEXTFILE AS SELECT num1, num2 FROM testhiveonspark.temptable1 GROUP BY num1, num2; SELECT * FROM testhiveonspark.temptable2; EOF hive -f testhiveonspark_working.hql Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties OK Time taken: 0.931 seconds Query ID = jolsen_20161112104040_4e398c2c-4691-47c8-ab79-dfd56195bf2a Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Spark Job = 3f263511-afa9-48ba-b224-1496194d5d3e Running with YARN Application = application_1478903884683_0019 Kill Command = /usr/lib/hadoop/bin/yarn application -kill application_1478903884683_0019 Query Hive on Spark job[0] stages: 0 1 Status: Running (Hive on Spark job[0]) Job Progress Format CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost] 2016-11-12 10:41:15,453 Stage-0_0: 0/1 Stage-1_0: 0/1 2016-11-12 10:41:18,482 Stage-0_0: 0/1 Stage-1_0: 0/1 2016-11-12 10:41:21,504 Stage-0_0: 0(+1)/1 Stage-1_0: 0/1 2016-11-12 10:41:24,533 Stage-0_0: 1/1 Finished Stage-1_0: 0(+1)/1 2016-11-12 10:41:25,542 Stage-0_0: 1/1 Finished Stage-1_0: 1/1 Finished Status: Finished successfully in 18.15 seconds Moving data to: hdfs://hdfsnode:8020/user/hive/warehouse/testhiveonspark.db/temptable2 Table testhiveonspark.temptable2 stats: [numFiles=1, numRows=1, totalSize=4, rawDataSize=3] OK Time taken: 83.03 seconds OK 1 1 Time taken: 0.203 seconds, Fetched: 1 row(s) Execute a not working CTAS file (using only SELECT, no GROUP BY)...this does not complete: cat > testhiveonspark_not_working.hql << EOF set mapred.job.queue.name=root.apps10; set spark.master=yarn-client; set hive.server2.enable.doAs=false; set hive.execution.engine=spark; set spark.eventLog.enabled=true; set spark.shuffle.blockTransferService=nio; set spark.eventLog.dir=hdfs://HDFSNode:8020/user/spark/applicationHistory; set spark.shuffle.service.enabled=true; set spark.dynamicAllocation.enabled=true; DROP TABLE IF EXISTS testhiveonspark.temptable2; CREATE TABLE testhiveonspark.temptable2 STORED AS TEXTFILE AS SELECT num1, num2 FROM testhiveonspark.temptable1; EOF hive -f testhiveonspark_not_working.hql Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties OK Time taken: 1.041 seconds Query ID = jolsen_20161112111313_018a2914-6a3b-428c-af7e-81e0a99a302d Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Spark Job = c2affd61-1a39-4765-b3ac-e27bc79562ab Job hasn't been submitted after 61s. Aborting it. Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc. Please check YARN or Spark driver's logs for further information. Status: SENT FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask I was running version "Hive 1.1.0-cdh5.8.0" and Spark "version 1.6.0", and thought maybe this was related to a bug or default settings that I wasn't aware of. The release notes for cdh 5.8.2 indicated a ticket that seemed to relate to my issue (https://issues.apache.org/jira/browse/HIVE-12650), so decided to upgrade my cluster to the latest release cdh5.9.0. I still have the same problem, unfortunately. I "think" the logs are indicating that Hive is timing out before getting a response from Application Master, but I'm honestly not sure. I'm still investigating. I'm also running hive with debug, but not seeing anything that helps...the logs just seem to confirm what I'm already observing from error out: "hive --hiveconf hive.root.logger=DEBUG,console -f testhiveonspark_working.hql" I have also attempted the same steps on a Cloudera CDH Quickstart VM updated to cdh5.8.0 and received the same results. What could be the cause for the failure? The two CTAS statements are not that much different. Why would one work and the other fail? Thank you.

Online	Offline
Last Visited	‎11-18-2016 02:09 PM

Member Since	‎11-16-2016 04:36 AM
Last Visited	‎11-18-2016 02:09 PM
Posts	3

Cloudera Community

Re: Hive on Spark CTAS Fails with Straight SELECT ...

Re: Hive on Spark CTAS Fails with Straight SELECT ...

Re: Hive on Spark CTAS Fails with Straight SELECT ...

Hive on Spark CTAS Fails with Straight SELECT but ...