Support Questions

Find answers, ask questions, and share your expertise

Hive on Spark CTAS Fails with Straight SELECT but succeeds with SELECT GROUP BY

avatar
New Contributor

I have used Hive on Spark successfully for several weeks executing CTAS statements that included aggregated data. Recently, I needed to execute a CTAS with a straight select statement (no GROUP BY) and was surprised that it failed. I found I could execute the statement with Hive on MR just fine. I have created a test that isolates my issue.

Create source table:

hive -e "

    CREATE DATABASE IF NOT EXISTS testhiveonspark;
    DROP TABLE IF EXISTS testhiveonspark.temptable1;
    CREATE TABLE testhiveonspark.temptable1 (num1 INT, num2 INT)
    STORED AS TEXTFILE;
    INSERT into table testhiveonspark.temptable1 values (1, 1);
    select * from testhiveonspark.temptable1;

"

Execute a working CTAS file (using SELECT GROUP BY)...this completes just fine:

cat > testhiveonspark_working.hql << EOF

    set mapred.job.queue.name=root.apps10;
    set spark.master=yarn-client;
    set hive.server2.enable.doAs=false;
    set hive.execution.engine=spark;
    set spark.eventLog.enabled=true;
    set spark.shuffle.blockTransferService=nio;
    set spark.eventLog.dir=hdfs://hdfsnode:8020/user/spark/applicationHistory;
    set spark.shuffle.service.enabled=true;
    set spark.dynamicAllocation.enabled=true;

    DROP TABLE IF EXISTS testhiveonspark.temptable2;
    CREATE TABLE testhiveonspark.temptable2
    STORED AS TEXTFILE
    AS SELECT num1, num2 FROM testhiveonspark.temptable1 GROUP BY num1, num2;
    SELECT * FROM testhiveonspark.temptable2;

EOF
hive -f testhiveonspark_working.hql

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
OK
Time taken: 0.931 seconds
Query ID = jolsen_20161112104040_4e398c2c-4691-47c8-ab79-dfd56195bf2a
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Spark Job = 3f263511-afa9-48ba-b224-1496194d5d3e
Running with YARN Application = application_1478903884683_0019
Kill Command = /usr/lib/hadoop/bin/yarn application -kill application_1478903884683_0019

Query Hive on Spark job[0] stages:
0
1

Status: Running (Hive on Spark job[0])
Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
2016-11-12 10:41:15,453 Stage-0_0: 0/1  Stage-1_0: 0/1
2016-11-12 10:41:18,482 Stage-0_0: 0/1  Stage-1_0: 0/1
2016-11-12 10:41:21,504 Stage-0_0: 0(+1)/1      Stage-1_0: 0/1
2016-11-12 10:41:24,533 Stage-0_0: 1/1 Finished Stage-1_0: 0(+1)/1
2016-11-12 10:41:25,542 Stage-0_0: 1/1 Finished Stage-1_0: 1/1 Finished
Status: Finished successfully in 18.15 seconds
Moving data to: hdfs://hdfsnode:8020/user/hive/warehouse/testhiveonspark.db/temptable2
Table testhiveonspark.temptable2 stats: [numFiles=1, numRows=1, totalSize=4, rawDataSize=3]
OK
Time taken: 83.03 seconds
OK
1       1
Time taken: 0.203 seconds, Fetched: 1 row(s)

Execute a not working CTAS file (using only SELECT, no GROUP BY)...this does not complete:

cat > testhiveonspark_not_working.hql << EOF

    set mapred.job.queue.name=root.apps10;
    set spark.master=yarn-client;
    set hive.server2.enable.doAs=false;
    set hive.execution.engine=spark;
    set spark.eventLog.enabled=true;
    set spark.shuffle.blockTransferService=nio;
    set spark.eventLog.dir=hdfs://HDFSNode:8020/user/spark/applicationHistory;
    set spark.shuffle.service.enabled=true;
    set spark.dynamicAllocation.enabled=true;

    DROP TABLE IF EXISTS testhiveonspark.temptable2;
    CREATE TABLE testhiveonspark.temptable2
    STORED AS TEXTFILE
    AS SELECT num1, num2 FROM testhiveonspark.temptable1;

EOF
hive -f testhiveonspark_not_working.hql

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
OK
Time taken: 1.041 seconds
Query ID = jolsen_20161112111313_018a2914-6a3b-428c-af7e-81e0a99a302d
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Spark Job = c2affd61-1a39-4765-b3ac-e27bc79562ab
Job hasn't been submitted after 61s. Aborting it.
Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc.
Please check YARN or Spark driver's logs for further information.
Status: SENT
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

I was running version "Hive 1.1.0-cdh5.8.0" and Spark "version 1.6.0", and thought maybe this was related to a bug or default settings that I wasn't aware of. The release notes for cdh 5.8.2 indicated a ticket that seemed to relate to my issue (https://issues.apache.org/jira/browse/HIVE-12650), so decided to upgrade my cluster to the latest release cdh5.9.0. I still have the same problem, unfortunately.

I "think" the logs are indicating that Hive is timing out before getting a response from Application Master, but I'm honestly not sure. I'm still investigating. I'm also running hive with debug, but not seeing anything that helps...the logs just seem to confirm what I'm already observing from error out: "hive --hiveconf hive.root.logger=DEBUG,console -f testhiveonspark_working.hql"

 

I have also attempted the same steps on a Cloudera CDH Quickstart VM updated to cdh5.8.0 and received the same results.

 

What could be the cause for the failure? The two CTAS statements are not that much different. Why would one work and the other fail?

 

 

Thank you.

1 ACCEPTED SOLUTION

avatar
New Contributor

I dug through the Hive source code locating the source of the logging messages (e.g. "Job hasn't been submitted after 61s") I was seeing output to the console. From the code, I was able to locate a property "hive.spark.job.monitor.timeout" which defaults to 60s, just about the exact time my job timed out so figured must be the right property. I tried my job again, increasing "hive.spark.job.monitor.timeout" each time, and after increasing it to "180s" my job finally executed before before timing out. Problem solved.

I have no idea why the my job should up to 3 minutes to actually execute which seems like an extremely long delay, but I'll leave that research for another time.

This was my final code, which worked:

 

set mapred.job.queue.name=root.apps10;
set spark.master=yarn-client;
set hive.server2.enable.doAs=false;
set hive.execution.engine=spark;
set spark.eventLog.enabled=true;
set spark.shuffle.blockTransferService=nio;
set spark.eventLog.dir=hdfs://HDFSNode:8020/user/spark/applicationHistory;
set spark.shuffle.service.enabled=true;
set spark.dynamicAllocation.enabled=true;
set hive.spark.job.monitor.timeout=180s;

DROP TABLE IF EXISTS testhiveonspark.temptable2;
CREATE TABLE testhiveonspark.temptable2
STORED AS TEXTFILE
AS SELECT num1, num2 FROM testhiveonspark.temptable1;

View solution in original post

2 REPLIES 2

avatar
New Contributor

I ran my both my working and not working tests in debug  like "hive --hiveconf hive.root.logger=DEBUG,console -f testhiveonspark_not_working.hql" and compared the outputs.  From the delta, I tried to select what I thought was the most relevant parts from the log, which I pasted below.  I'm still stuggling to understand why "Job hasn't been submitted after 61s", and the source of this problem.  Any help?  Thank you.

 

 ...

16/11/12 11:42:56 [main]: INFO client.SparkClientImpl: Running client driver with argv: /usr/lib/spark/bin/spark-submit --properties-file /tmp/spark-submit.5047982490856668419.properties --class org.apache.hive.spark.client.RemoteDriver /usr/lib/hive/lib/hive-exec-1.1.0-cdh5.9.0.jar --remote-host HDFSNode --remote-port 42944 --conf hive.spark.client.connect.timeout=1000 --conf hive.spark.client.server.connect.timeout=90000 --conf hive.spark.client.channel.log.level=null --conf hive.spark.client.rpc.max.size=52428800 --conf hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256 --conf hive.spark.client.rpc.server.address=null
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Class path contains multiple SLF4J bindings.
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Found binding in [jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:57 INFO RemoteDriver: Connecting to: HDFSNode:42944
16/11/12 11:42:58 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:58 WARN Rpc: Invalid log level null, reverting to default.
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG util.ResourceLeakDetector: -Dio.netty.leakDetectionLevel: simple
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG util.Recycler: -Dio.netty.recycler.maxCapacity.default: 262144
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (41 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG internal.Cleaner0: java.nio.ByteBuffer.cleaner(): available
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Handling SASL challenge message...
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Sending SASL challenge response...
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (98 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (275 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Handling SASL challenge message...
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Sending SASL challenge response...
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (45 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: SASL negotiation finished with QOP auth.
16/11/12 11:42:58 [main]: DEBUG session.SparkSessionManagerImpl: New session (23ef000b-8281-4c2d-a1db-077a3d40366f) is created.
16/11/12 11:42:58 [main]: INFO log.PerfLogger: <PERFLOG method=SparkSubmitJob from=org.apache.hadoop.hive.ql.exec.spark.SparkTask>
16/11/12 11:42:58 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:58 INFO SparkContext: Running Spark version 1.6.0
16/11/12 11:42:58 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 0 (org.apache.hive.spark.client.BaseProtocol$SyncJobRequest).
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.BaseProtocol$SyncJobRequest (168 bytes)
16/11/12 11:42:58 [main]: DEBUG hdfs.DFSClient: /tmp/hive/jolsen/c1dcb42a-8312-42df-a3c2-c1c359e10011/hive_2016-11-12_11-42-54_731_1054170393347011832-1: masked=rwx------
16/11/12 11:42:58 [IPC Parameter Sending Thread #0]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #22
16/11/12 11:42:58 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #22
16/11/12 11:42:58 [main]: DEBUG ipc.ProtobufRpcEngine: Call: mkdirs took 9ms
16/11/12 11:42:58 [IPC Parameter Sending Thread #0]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #23
16/11/12 11:42:58 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #23
16/11/12 11:42:58 [main]: DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms
16/11/12 11:42:58 [main]: INFO ql.Context: New scratch dir is hdfs://HDFSNode:8020/tmp/hive/jolsen/c1dcb42a-8312-42df-a3c2-c1c359e10011/hive_2016-11-12_11-42-54_731_1054170393347011832-1
16/11/12 11:42:58 [main]: DEBUG hdfs.DFSClient: /tmp/hive/jolsen/c1dcb42a-8312-42df-a3c2-c1c359e10011/hive_2016-11-12_11-42-54_731_1054170393347011832-1/-mr-10003: masked=rwxr-xr-x
16/11/12 11:42:58 [IPC Parameter Sending Thread #0]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #24
16/11/12 11:42:58 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #24
16/11/12 11:42:58 [main]: DEBUG ipc.ProtobufRpcEngine: Call: mkdirs took 5ms
16/11/12 11:42:58 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 1 (org.apache.hive.spark.client.BaseProtocol$JobRequest).
16/11/12 11:42:58 [main]: DEBUG client.SparkClientImpl: Send JobRequest[3e50626e-3a1e-47d0-a18c-76869e1277ae].
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.BaseProtocol$JobRequest (174708 bytes)
16/11/12 11:42:58 [main]: INFO log.PerfLogger: </PERFLOG method=SparkSubmitJob start=1478968978795 end=1478968978987 duration=192 from=org.apache.hadoop.hive.ql.exec.spark.SparkTask>
Starting Spark Job = 3e50626e-3a1e-47d0-a18c-76869e1277ae
16/11/12 11:42:58 [main]: INFO exec.Task: Starting Spark Job = 3e50626e-3a1e-47d0-a18c-76869e1277ae
16/11/12 11:42:58 [main]: INFO log.PerfLogger: <PERFLOG method=SparkRunJob from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor>
16/11/12 11:42:58 [main]: INFO log.PerfLogger: <PERFLOG method=SparkSubmitToRunning from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor>
state = SENT
16/11/12 11:42:58 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SecurityManager: Changing view acls to: jolsen
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SecurityManager: Changing modify acls to: jolsen
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jolsen); users with modify permissions: Set(jolsen)
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Utils: Successfully started service 'sparkDriver' on port 43254.
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Slf4jLogger: Slf4jLogger started
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Remoting: Starting remoting
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.1.99:53727]
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@192.168.1.99:53727]
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 53727.
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SparkEnv: Registering MapOutputTracker
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SparkEnv: Registering BlockManagerMaster
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-1353a7a8-14f0-4b9d-b369-1224fbc9b331
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO MemoryStore: MemoryStore started with capacity 530.3 MB
state = SENT
16/11/12 11:42:59 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO SparkEnv: Registering OutputCommitCoordinator
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO SparkUI: Started SparkUI at http://192.168.1.99:4040
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO SparkContext: Added JAR file:/usr/lib/hive/lib/hive-exec-1.1.0-cdh5.9.0.jar at spark://192.168.1.99:43254/jars/hive-exec-1.1.0-cdh5.9.0.jar with timestamp 1478968980504
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO RMProxy: Connecting to ResourceManager at HDFSNode/192.168.1.99:8032
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Requesting a new application from cluster with 10 NodeManagers
state = SENT
16/11/12 11:43:00 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (20480 MB per container)
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Setting up container launch context for our AM
16/11/12 11:43:01 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:01 INFO Client: Setting up the launch environment for our AM container
16/11/12 11:43:01 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:01 INFO Client: Preparing resources for our AM container
16/11/12 11:43:01 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:01 INFO Client: Uploading resource file:/usr/lib/spark/lib/spark-assembly-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5.9.0.jar -> hdfs://HDFSNode:8020/user/jolsen/.sparkStaging/application_1478903884683_0032/spark-assembly-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5.9.0.jar
state = SENT
16/11/12 11:43:01 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:02 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:02 INFO Client: Uploading resource file:/tmp/spark-695bd22c-574c-41ff-9e38-fa491f09c3f6/__spark_conf__2574250824979235523.zip -> hdfs://HDFSNode:8020/user/jolsen/.sparkStaging/application_1478903884683_0032/__spark_conf__2574250824979235523.zip
state = SENT
16/11/12 11:43:02 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO SecurityManager: Changing view acls to: jolsen
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO SecurityManager: Changing modify acls to: jolsen
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jolsen); users with modify permissions: Set(jolsen)
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO Client: Submitting application 32 to ResourceManager
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO YarnClientImpl: Submitted application application_1478903884683_0032
state = SENT
16/11/12 11:43:04 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:05 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:06 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:07 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:08 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:08 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms
16/11/12 11:43:08 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen: closed
16/11/12 11:43:08 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen: stopped, remaining connections 0
state = SENT
16/11/12 11:43:09 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:10 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:10 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:10 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
16/11/12 11:43:10 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:10 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> HDFSNode, PROXY_URI_BASES -> http://HDFSNode:8151/proxy/application_1478903884683_0032), /proxy/application_1478903884683_0032
16/11/12 11:43:10 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:10 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
state = SENT
16/11/12 11:43:11 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:12 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:13 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:14 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:15 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:16 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:17 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:18 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:19 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:20 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:21 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:22 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:22 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms
state = SENT
16/11/12 11:43:23 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:24 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:25 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:26 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:27 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:28 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:29 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:30 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:31 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:32 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:33 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:34 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:35 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:35 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms
state = SENT
16/11/12 11:43:36 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:37 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:38 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:39 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:40 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:41 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:42 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:43 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:44 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:45 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:46 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:47 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:48 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:48 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms
state = SENT
16/11/12 11:43:49 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:50 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:51 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:52 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:53 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:54 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:55 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:56 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:57 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:58 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:59 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:44:00 [main]: INFO status.SparkJobMonitor: state = SENT
Job hasn't been submitted after 61s. Aborting it.
Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc.
Please check YARN or Spark driver's logs for further information.
16/11/12 11:44:00 [main]: ERROR status.SparkJobMonitor: Job hasn't been submitted after 61s. Aborting it.
Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc.
Please check YARN or Spark driver's logs for further information.
Status: SENT
16/11/12 11:44:00 [main]: ERROR status.SparkJobMonitor: Status: SENT
16/11/12 11:44:00 [main]: INFO log.PerfLogger: </PERFLOG method=SparkRunJob start=1478968978989 end=1478969040067 duration=61078 from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor>
16/11/12 11:44:00 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 2 (org.apache.hive.spark.client.BaseProtocol$CancelJob).
16/11/12 11:44:00 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes)
16/11/12 11:44:00 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.BaseProtocol$CancelJob (91 bytes)
16/11/12 11:44:00 [main]: DEBUG ipc.Client: The ping interval is 60000 ms.
16/11/12 11:44:00 [main]: DEBUG ipc.Client: Connecting to HDFSNode/192.168.1.99:8020
16/11/12 11:44:00 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen: starting, having connections 1
16/11/12 11:44:00 [IPC Parameter Sending Thread #1]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #25
16/11/12 11:44:00 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #25
16/11/12 11:44:00 [main]: DEBUG ipc.ProtobufRpcEngine: Call: delete took 34ms
16/11/12 11:44:00 [IPC Parameter Sending Thread #1]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #26
16/11/12 11:44:00 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #26
16/11/12 11:44:00 [main]: DEBUG ipc.ProtobufRpcEngine: Call: delete took 1ms
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
16/11/12 11:44:00 [main]: ERROR ql.Driver: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
16/11/12 11:44:00 [main]: INFO log.PerfLogger: </PERFLOG method=Driver.execute start=1478968975448 end=1478969040127 duration=64679 from=org.apache.hadoop.hive.ql.Driver>
16/11/12 11:44:00 [main]: INFO metadata.Hive: Dumping metastore api call timing information for : execution phase
16/11/12 11:44:00 [main]: DEBUG metadata.Hive: Total time spent in each metastore function (ms): {isCompatibleWith_(HiveConf, )=1}
16/11/12 11:44:00 [main]: INFO ql.Driver: Completed executing command(queryId=jolsen_20161112114242_6a1682d9-cc1a-4e6b-904e-03baf361de34); Time taken: 64.679 seconds
16/11/12 11:44:00 [main]: INFO log.PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
16/11/12 11:44:00 [main]: DEBUG ZooKeeperHiveLockManager: About to release lock for testhiveonspark/temptable1
16/11/12 11:44:00 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x5585583fa3e000e, packet:: clientPath:null serverPath:null finished:false header:: 20,2 replyHeader:: 20,176093659822,0 request:: '/hive_zookeeper_namespace/testhiveonspark/temptable1/LOCK-SHARED-0000000000,-1 response:: null
16/11/12 11:44:00 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 2ms
16/11/12 11:44:00 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x5585583fa3e000e, packet:: clientPath:null serverPath:null finished:false header:: 21,12 replyHeader:: 21,176093659822,0 request:: '/hive_zookeeper_namespace/testhiveonspark/temptable1,F response:: v{},s{176093659820,176093659820,1478968975439,1478968975439,0,2,0,0,0,0,176093659822}
16/11/12 11:44:00 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x5585583fa3e000e, packet:: clientPath:null serverPath:null finished:false header:: 22,2 replyHeader:: 22,176093659823,0 request:: '/hive_zookeeper_namespace/testhiveonspark/temptable1,-1 response:: null
16/11/12 11:44:00 [main]: DEBUG ZooKeeperHiveLockManager: About to release lock for testhiveonspark
16/11/12 11:44:00 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x5585583fa3e000e, packet:: clientPath:null serverPath:null finished:false header:: 23,2 replyHeader:: 23,176093659824,0 request:: '/hive_zookeeper_namespace/testhiveonspark/LOCK-SHARED-0000000000,-1 response:: null
16/11/12 11:44:00 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x5585583fa3e000e, packet:: clientPath:null serverPath:null finished:false header:: 24,12 replyHeader:: 24,176093659824,0 request:: '/hive_zookeeper_namespace/testhiveonspark,F response:: v{},s{176093659818,176093659818,1478968975429,1478968975429,0,4,0,0,0,0,176093659824}
16/11/12 11:44:00 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x5585583fa3e000e, packet:: clientPath:null serverPath:null finished:false header:: 25,2 replyHeader:: 25,176093659825,0 request:: '/hive_zookeeper_namespace/testhiveonspark,-1 response:: null
16/11/12 11:44:00 [main]: INFO log.PerfLogger: </PERFLOG method=releaseLocks start=1478969040128 end=1478969040142 duration=14 from=org.apache.hadoop.hive.ql.Driver>
16/11/12 11:44:00 [IPC Parameter Sending Thread #1]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #27
16/11/12 11:44:00 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #27
16/11/12 11:44:00 [main]: DEBUG ipc.ProtobufRpcEngine: Call: delete took 11ms
16/11/12 11:44:00 [IPC Parameter Sending Thread #1]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #28
16/11/12 11:44:00 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #28
16/11/12 11:44:00 [main]: DEBUG ipc.ProtobufRpcEngine: Call: delete took 11ms
16/11/12 11:44:00 [main]: DEBUG session.SparkSessionManagerImpl: Closing session (23ef000b-8281-4c2d-a1db-077a3d40366f).
16/11/12 11:44:00 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 3 (org.apache.hive.spark.client.BaseProtocol$EndSession).
16/11/12 11:44:00 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes)
16/11/12 11:44:00 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.BaseProtocol$EndSession (55 bytes)
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO Client: Application report for application_1478903884683_0032 (state: RUNNING)
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO Client:
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: client token: N/A
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: diagnostics: N/A
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: ApplicationMaster host: 192.168.1.151
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: ApplicationMaster RPC port: 0
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: queue: root.apps10
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: start time: 1478968983665
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: final status: UNDEFINED
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: tracking URL: http://HDFSNode:8151/proxy/application_1478903884683_0032/
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: user: jolsen
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO YarnClientSchedulerBackend: Application application_1478903884683_0032 has started running.
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36149.
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO NettyBlockTransferService: Server created on 36149
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO BlockManager: external shuffle service port = 7337
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO BlockManagerMaster: Trying to register BlockManager
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.99:36149 with 530.3 MB RAM, BlockManagerId(driver, 192.168.1.99, 36149)
16/11/12 11:44:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:03 INFO BlockManagerMaster: Registered BlockManager
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO EventLoggingListener: Logging events to hdfs://HDFSNode:8020/user/spark/applicationHistory/application_1478903884683_0032
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 ERROR SparkContext: Error adding jar (java.lang.IllegalArgumentException: requirement failed: JAR hive-exec-1.1.0-cdh5.9.0.jar already registered.), was the --addJars option used?
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO RpcDispatcher: [DriverProtocol] Closing channel due to exception in pipeline (Connection reset by peer).
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 WARN RemoteDriver: Shutting down driver because RPC channel was closed.
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO RemoteDriver: Shutting down remote driver.
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO SparkUI: Stopped Spark web UI at http://192.168.1.99:4040
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO YarnClientSchedulerBackend: Shutting down all executors
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO YarnClientSchedulerBackend: Interrupting monitor thread
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO YarnClientSchedulerBackend: Asking each executor to shut down
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO YarnClientSchedulerBackend: Stopped
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO MemoryStore: MemoryStore cleared
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO BlockManager: BlockManager stopped
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO BlockManagerMaster: BlockManagerMaster stopped
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO SparkContext: Successfully stopped SparkContext
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO ShutdownHookManager: Shutdown hook called
16/11/12 11:44:04 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:44:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-695bd22c-574c-41ff-9e38-fa491f09c3f6
16/11/12 11:44:04 [IPC Parameter Sending Thread #1]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #29
16/11/12 11:44:04 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #29
16/11/12 11:44:04 [main]: DEBUG ipc.ProtobufRpcEngine: Call: delete took 6ms
16/11/12 11:44:04 [main]: DEBUG metadata.Hive: Closing current thread's connection to Hive Metastore.
16/11/12 11:44:04 [main]: INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/11/12 11:44:04 [Thread-10]: INFO session.SparkSessionManagerImpl: Closing the session manager.
16/11/12 11:44:04 [Thread-7]: DEBUG imps.CuratorFrameworkImpl: Closing
16/11/12 11:44:04 [Thread-7]: DEBUG curator.CuratorZookeeperClient: Closing
16/11/12 11:44:04 [Thread-7]: DEBUG curator.ConnectionState: Closing
16/11/12 11:44:04 [Thread-7]: DEBUG zookeeper.ZooKeeper: Closing session: 0x5585583fa3e000e
16/11/12 11:44:04 [Thread-7]: DEBUG zookeeper.ClientCnxn: Closing client for session: 0x5585583fa3e000e
16/11/12 11:44:04 [IPC Parameter Sending Thread #1]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #30
16/11/12 11:44:04 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #30
16/11/12 11:44:04 [Thread-0]: DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms
16/11/12 11:44:04 [IPC Parameter Sending Thread #1]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #31
16/11/12 11:44:04 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #31
16/11/12 11:44:04 [Thread-0]: DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms
16/11/12 11:44:04 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x5585583fa3e000e, packet:: clientPath:null serverPath:null finished:false header:: 26,-11 replyHeader:: 26,176093659826,0 request:: null response:: null
16/11/12 11:44:04 [Thread-7]: DEBUG zookeeper.ClientCnxn: Disconnecting client for session: 0x5585583fa3e000e
16/11/12 11:44:04 [Thread-7]: INFO zookeeper.ZooKeeper: Session: 0x5585583fa3e000e closed
16/11/12 11:44:04 [main-EventThread]: INFO zookeeper.ClientCnxn: EventThread shut down
16/11/12 11:44:04 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: An exception was thrown while closing send thread for session 0x5585583fa3e000e : Unable to read additional data from server sessionid 0x5585583fa3e000e, likely server has closed socket
16/11/12 11:44:04 [Thread-7]: INFO CuratorFrameworkSingleton: Closing ZooKeeper client.
16/11/12 11:44:04 [Thread-0]: DEBUG ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@513c46e
16/11/12 11:44:04 [Thread-0]: DEBUG ipc.Client: removing client from cache: org.apache.hadoop.ipc.Client@513c46e
16/11/12 11:44:04 [Thread-0]: DEBUG ipc.Client: stopping actual client because no more references remain: org.apache.hadoop.ipc.Client@513c46e
16/11/12 11:44:04 [Thread-0]: DEBUG ipc.Client: Stopping client
16/11/12 11:44:04 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen: closed
16/11/12 11:44:04 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen: stopped, remaining connections 0

avatar
New Contributor

I dug through the Hive source code locating the source of the logging messages (e.g. "Job hasn't been submitted after 61s") I was seeing output to the console. From the code, I was able to locate a property "hive.spark.job.monitor.timeout" which defaults to 60s, just about the exact time my job timed out so figured must be the right property. I tried my job again, increasing "hive.spark.job.monitor.timeout" each time, and after increasing it to "180s" my job finally executed before before timing out. Problem solved.

I have no idea why the my job should up to 3 minutes to actually execute which seems like an extremely long delay, but I'll leave that research for another time.

This was my final code, which worked:

 

set mapred.job.queue.name=root.apps10;
set spark.master=yarn-client;
set hive.server2.enable.doAs=false;
set hive.execution.engine=spark;
set spark.eventLog.enabled=true;
set spark.shuffle.blockTransferService=nio;
set spark.eventLog.dir=hdfs://HDFSNode:8020/user/spark/applicationHistory;
set spark.shuffle.service.enabled=true;
set spark.dynamicAllocation.enabled=true;
set hive.spark.job.monitor.timeout=180s;

DROP TABLE IF EXISTS testhiveonspark.temptable2;
CREATE TABLE testhiveonspark.temptable2
STORED AS TEXTFILE
AS SELECT num1, num2 FROM testhiveonspark.temptable1;