Support Questions

Find answers, ask questions, and share your expertise

Oozie stuck in RUNNING state

avatar
New Contributor

I am running a spark job on oozie. This spark job processes some data on S3 and then loads the data into snowflake DWH.

 

MrBeasr_1-1710842321349.png

 

At the end of the code I am calling a spark stop.

import org.apache.log4j.LogManager
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import scala.util.control.NonFatal

object SnowflakeDriver {
  @transient lazy val logger = LogManager.getLogger(SnowflakeDriver.getClass)

  // starting point of the application
  def main(args: Array[String]): Unit = {

    val sparkConf = new SparkConf()
    val runtimeEnvironment = sparkConf.get("spark.eigi.dap.runtime.environment")
    val spark = SparkSession.builder().config(sparkConf).appName(s"${
      JavaUtils.getConfigProps(runtimeEnvironment).getProperty("appName")
    }-SnowflakeSitelockDriver").enableHiveSupport().getOrCreate()
    
    val jdbcConnection = JavaUtils.getSFJDBCConnection(runtimeEnvironment, args(0), 0)
    
    try {
      // here is the code to process data and put this data into snowflake 
    } catch {
      case NonFatal(ex) => {
        jdbcConnection.rollback()
        logger.error(s"Failed to run SF Operation due to ${ex.getMessage}", ex)
        JavaUtils.awsSNSOut(runtimeEnvironment,
          JavaUtils.getConfigProps(runtimeEnvironment).getProperty("aws.sns.fatal.topic"),
          s" ${JavaUtils.getConfigProps(runtimeEnvironment).getProperty("appName")} - on $runtimeEnvironment, Failed to SF Operation due to ${ex.getMessage}")
        throw ex
      }
    } finally {
      jdbcConnection.close()
      logger.info("Stopping Spark...")
      spark.stop
    }
  }

  private def printUsage: Unit = {
    System.err.println(s"Usage: ${getClass.getSimpleName} sfPassword [sfVDWSize]")
    System.exit(-1)
  }
}

Here are the logs:

2024-03-19 07:06:31,968 [main] INFO  com.cc.bigdata.dailyreports.sitelock.SnowflakeDriver$  - Stopping Spark...
2024-03-19 07:06:31,984 [main] INFO  org.sparkproject.jetty.server.AbstractConnector  - Stopped Spark@23d5d9fc{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
2024-03-19 07:06:31,986 [main] INFO  org.apache.spark.ui.SparkUI  - Stopped Spark web UI at http://ip-10-13-25-16.ec2.internal:4040
2024-03-19 07:06:31,991 [YARN application state monitor] INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend  - Interrupting monitor thread
2024-03-19 07:06:32,016 [main] INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend  - Shutting down all executors
2024-03-19 07:06:32,016 [dispatcher-CoarseGrainedScheduler] INFO  org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint  - Asking each executor to shut down
2024-03-19 07:06:32,022 [main] INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend  - YARN client scheduler backend Stopped
2024-03-19 07:06:32,043 [dispatcher-event-loop-9] INFO  org.apache.spark.MapOutputTrackerMasterEndpoint  - MapOutputTrackerMasterEndpoint stopped!
2024-03-19 07:06:32,063 [main] INFO  org.apache.spark.storage.memory.MemoryStore  - MemoryStore cleared
2024-03-19 07:06:32,064 [main] INFO  org.apache.spark.storage.BlockManager  - BlockManager stopped
2024-03-19 07:06:32,077 [main] INFO  org.apache.spark.storage.BlockManagerMaster  - BlockManagerMaster stopped
2024-03-19 07:06:32,083 [dispatcher-event-loop-15] INFO  org.apache.spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint  - OutputCommitCoordinator stopped!
2024-03-19 07:06:32,093 [main] INFO  org.apache.spark.SparkContext  - Successfully stopped SparkContext

<<< Invocation of Spark command completed <<<

Hadoop Job IDs executed by Spark: job_1710272981670_0506


<<< Invocation of Main class completed <<<

Oozie Launcher, uploading action data to HDFS sequence file: hdfs://ip-<ip>:8020/user/hadoop/oozie-oozi/0000204-240312195311040-oozie-oozi-W/SnowflakeIntegration--spark/action-data.seq
2024-03-19 07:06:32,152 [main] INFO  org.apache.hadoop.io.compress.CodecPool  - Got brand-new compressor [.deflate]
Stopping AM
2024-03-19 07:06:32,188 [main] INFO  org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl  - Waiting for application to be successfully unregistered.
Callback notification attempts left 0
Callback notification trying http://ip-<ip>.ec2.internal:11000/oozie/callback?id=0000204-240312195311040-oozie-oozi-W@SnowflakeIntegration&status=SUCCEEDED
Callback notification to http://ip-<ip>.ec2.internal:11000/oozie/callback?id=0000204-240312195311040-oozie-oozi-W@SnowflakeIntegration&status=SUCCEEDED succeeded
Callback notification succeeded
2024-03-19 07:06:32,972 [shutdown-hook-0] INFO  org.apache.spark.util.ShutdownHookManager  - Shutdown hook called
2024-03-19 07:06:32,973 [shutdown-hook-0] INFO  org.apache.spark.util.ShutdownHookManager  - Deleting directory /mnt/yarn/usercache/hadoop/appcache/application_1710272981670_0505/spark-0072b5a6-b8f5-4ed2-9fbf-b295bd878711
2024-03-19 07:06:32,977 [shutdown-hook-0] INFO  org.apache.spark.util.ShutdownHookManager  - Deleting directory /mnt/tmp/spark-a12c27e8-ec49-4e75-a8f9-2355693611a2

End of LogType:stdout
***********************************************************************

Container: container_1710272981670_0505_01_000001 on ip-10-13-25-16.ec2.internal_8041
LogAggregationType: AGGREGATED
=====================================================================================
LogType:syslog
LogLastModifiedTime:Tue Mar 19 07:06:33 +0000 2024
LogLength:700
LogContents:
2024-03-19 06:51:08,166 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-03-19 06:51:08,487 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at ip-<ip>.ec2.internal/<ip>:8030
2024-03-19 06:51:08,734 INFO [main] org.apache.hadoop.conf.Configuration: resource-types.xml not found
2024-03-19 06:51:08,735 INFO [main] org.apache.hadoop.yarn.util.resource.ResourceUtils: Unable to find 'resource-types.xml'.
2024-03-19 06:51:09,578 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at ip-<ip>/10.13.25.58:8032

End of LogType:syslog
***********************************************************************

You can see in the logs that "Stopping Spark..." was printed, which means that the job has executed until the last step. The logs also mentions that spark was shutdown. The logs end here. But the oozie workflow is still in RUNNING state.

Why is this happening? How can I fix this?

1 REPLY 1

avatar
Master Collaborator

Hi @MrBeasr 

Review the oozie logs for this workflow if there is anything suspicious and you can paste here.

oozie job -oozie http://<oozie-server-host>:11000 -log <workflow-id>

 

Regards,

Chethan YM