Support Questions

Find answers, ask questions, and share your expertise

Job hang when Insert data into table in Spark Thrift Server

avatar
New Contributor

Hi Everyone,

I am facing a problem that I try to insert data into hiverserver2 by spark thrift server (actually I use beeline), the job of insert is stucked.

I have checked that spark MasterApplication UI page, and find that it shows as following figure.

allen_chu_1-1729064115670.png

The log of spark thrift server is as following :

24/10/16 15:21:39 INFO SparkExecuteStatementOperation: Submitting query 'insert into test_database.test_table (a,b) values (2,33)' with a75190ac-d536-4ee1-a1ff-da42a195a40b
24/10/16 15:21:39 INFO SparkExecuteStatementOperation: Running query with a75190ac-d536-4ee1-a1ff-da42a195a40b
24/10/16 15:21:40 INFO FileUtils: Creating directory if it doesn't exist: hdfs://ha/warehouse/tablespace/managed/hive/test_database.db/test_table/.hive-staging_hive_2024-10-16_15-21-40_061_8849017887411502804-3
24/10/16 15:21:40 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
24/10/16 15:21:40 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
24/10/16 15:21:40 INFO SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
24/10/16 15:21:40 INFO SparkContext: Starting job: run at AccessController.java:0
24/10/16 15:21:40 INFO DAGScheduler: Got job 2 (run at AccessController.java:0) with 1 output partitions
24/10/16 15:21:40 INFO DAGScheduler: Final stage: ResultStage 2 (run at AccessController.java:0)
24/10/16 15:21:40 INFO DAGScheduler: Parents of final stage: List()
24/10/16 15:21:40 INFO DAGScheduler: Missing parents: List()
24/10/16 15:21:40 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[8] at run at AccessController.java:0), which has no missing parents
24/10/16 15:21:40 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 421.2 KiB, free 910.8 MiB)
24/10/16 15:21:40 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 147.0 KiB, free 910.6 MiB)
24/10/16 15:21:40 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on persp-6.persp.net:45131 (size: 147.0 KiB, free: 911.9 MiB)
24/10/16 15:21:40 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1535
24/10/16 15:21:40 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[8] at run at AccessController.java:0) (first 15 tasks are for partitions Vector(0))
24/10/16 15:21:40 INFO YarnScheduler: Adding task set 2.0 with 1 tasks resource profile 0
24/10/16 15:21:40 INFO FairSchedulableBuilder: Added task set TaskSet_2.0 tasks to pool default
24/10/16 15:21:50 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/16 15:22:05 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/16 15:22:20 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/16 15:22:35 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/16 15:22:50 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/16 15:23:05 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Please help me to figure out what happens, thanks a lot.

1 ACCEPTED SOLUTION

avatar
New Contributor

Hi everyone,

Thank you all for your responses. I am using Spark 3, and I’ve discovered that the issue is due to the improper configuration of the spark_shuffle settings in the yarn-site.xml file.

Thanks again!

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

@allen_chu

This looks like a Yarn Resource issue.  I would recommend opening a case in the Cloudera Support Portal under the Yarn Component to get further assistance with this.

avatar
Contributor

Hi @allen_chu , Let me know if my understanding is correct 
You are trying to insert data on Hive using spark thrift server and it is getting stuck.
However, when you insert the data using beeline you managed to insert the data.
Which CDP version you are using ? 
Do you see any Yarn application getting created ?

avatar
New Contributor

Hi everyone,

Thank you all for your responses. I am using Spark 3, and I’ve discovered that the issue is due to the improper configuration of the spark_shuffle settings in the yarn-site.xml file.

Thanks again!