Support Questions

allen_chu · ‎10-16-2024

Hi Everyone,

I am facing a problem that I try to insert data into hiverserver2 by spark thrift server (actually I use beeline), the job of insert is stucked.

I have checked that spark MasterApplication UI page, and find that it shows as following figure.

The log of spark thrift server is as following :

24/10/16 15:21:39 INFO SparkExecuteStatementOperation: Submitting query 'insert into test_database.test_table (a,b) values (2,33)' with a75190ac-d536-4ee1-a1ff-da42a195a40b
24/10/16 15:21:39 INFO SparkExecuteStatementOperation: Running query with a75190ac-d536-4ee1-a1ff-da42a195a40b
24/10/16 15:21:40 INFO FileUtils: Creating directory if it doesn't exist: hdfs://ha/warehouse/tablespace/managed/hive/test_database.db/test_table/.hive-staging_hive_2024-10-16_15-21-40_061_8849017887411502804-3
24/10/16 15:21:40 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
24/10/16 15:21:40 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
24/10/16 15:21:40 INFO SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
24/10/16 15:21:40 INFO SparkContext: Starting job: run at AccessController.java:0
24/10/16 15:21:40 INFO DAGScheduler: Got job 2 (run at AccessController.java:0) with 1 output partitions
24/10/16 15:21:40 INFO DAGScheduler: Final stage: ResultStage 2 (run at AccessController.java:0)
24/10/16 15:21:40 INFO DAGScheduler: Parents of final stage: List()
24/10/16 15:21:40 INFO DAGScheduler: Missing parents: List()
24/10/16 15:21:40 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[8] at run at AccessController.java:0), which has no missing parents
24/10/16 15:21:40 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 421.2 KiB, free 910.8 MiB)
24/10/16 15:21:40 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 147.0 KiB, free 910.6 MiB)
24/10/16 15:21:40 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on persp-6.persp.net:45131 (size: 147.0 KiB, free: 911.9 MiB)
24/10/16 15:21:40 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1535
24/10/16 15:21:40 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[8] at run at AccessController.java:0) (first 15 tasks are for partitions Vector(0))
24/10/16 15:21:40 INFO YarnScheduler: Adding task set 2.0 with 1 tasks resource profile 0
24/10/16 15:21:40 INFO FairSchedulableBuilder: Added task set TaskSet_2.0 tasks to pool default
24/10/16 15:21:50 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/16 15:22:05 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/16 15:22:20 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/16 15:22:35 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/16 15:22:50 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/16 15:23:05 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Please help me to figure out what happens, thanks a lot.

allen_chu · ‎10-16-2024

Hi everyone,

Thank you all for your responses. I am using Spark 3, and I’ve discovered that the issue is due to the improper configuration of the spark_shuffle settings in the yarn-site.xml file.

Thanks again!

View solution in original post

MGreen · ‎10-16-2024

@allen_chu

This looks like a Yarn Resource issue. I would recommend opening a case in the Cloudera Support Portal under the Yarn Component to get further assistance with this.

caio_contente · ‎10-16-2024

Hi @allen_chu , Let me know if my understanding is correct
You are trying to insert data on Hive using spark thrift server and it is getting stuck.
However, when you insert the data using beeline you managed to insert the data.
Which CDP version you are using ?
Do you see any Yarn application getting created ?

allen_chu · ‎10-16-2024

Hi everyone,

Thank you all for your responses. I am using Spark 3, and I’ve discovered that the issue is due to the improper configuration of the spark_shuffle settings in the yarn-site.xml file.

Thanks again!

Cloudera Community

Support Questions

Job hang when Insert data into table in Spark Thrift Server

Secure(SSL encryption) Spark Thrift server

Installing Spark Thrift Server in a Kerberos secur...

Connect to Spark Thrift server (Kerberos enabled) ...

Some Spark Thrift Server errors, with their work a...

Hive Msck Repair Table Hangs

Hive cli hangs when spark thrift server is running

Submit a Spark Job to CDP Data Hub using the Livy ...

Cannot Insert Data from Text File Format Table to ...

Run Spark Thrift server on multiple nodes

Can Nifi insert data into uppercase table in postg...