Member since
03-13-2024
2
Posts
2
Kudos Received
0
Solutions
03-19-2024
03:04 AM
1 Kudo
I am running a spark job on oozie. This spark job processes some data on S3 and then loads the data into snowflake DWH. At the end of the code I am calling a spark stop. import org.apache.log4j.LogManager
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import scala.util.control.NonFatal
object SnowflakeDriver {
@transient lazy val logger = LogManager.getLogger(SnowflakeDriver.getClass)
// starting point of the application
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf()
val runtimeEnvironment = sparkConf.get("spark.eigi.dap.runtime.environment")
val spark = SparkSession.builder().config(sparkConf).appName(s"${
JavaUtils.getConfigProps(runtimeEnvironment).getProperty("appName")
}-SnowflakeSitelockDriver").enableHiveSupport().getOrCreate()
val jdbcConnection = JavaUtils.getSFJDBCConnection(runtimeEnvironment, args(0), 0)
try {
// here is the code to process data and put this data into snowflake
} catch {
case NonFatal(ex) => {
jdbcConnection.rollback()
logger.error(s"Failed to run SF Operation due to ${ex.getMessage}", ex)
JavaUtils.awsSNSOut(runtimeEnvironment,
JavaUtils.getConfigProps(runtimeEnvironment).getProperty("aws.sns.fatal.topic"),
s" ${JavaUtils.getConfigProps(runtimeEnvironment).getProperty("appName")} - on $runtimeEnvironment, Failed to SF Operation due to ${ex.getMessage}")
throw ex
}
} finally {
jdbcConnection.close()
logger.info("Stopping Spark...")
spark.stop
}
}
private def printUsage: Unit = {
System.err.println(s"Usage: ${getClass.getSimpleName} sfPassword [sfVDWSize]")
System.exit(-1)
}
} Here are the logs: 2024-03-19 07:06:31,968 [main] INFO com.cc.bigdata.dailyreports.sitelock.SnowflakeDriver$ - Stopping Spark...
2024-03-19 07:06:31,984 [main] INFO org.sparkproject.jetty.server.AbstractConnector - Stopped Spark@23d5d9fc{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
2024-03-19 07:06:31,986 [main] INFO org.apache.spark.ui.SparkUI - Stopped Spark web UI at http://ip-10-13-25-16.ec2.internal:4040
2024-03-19 07:06:31,991 [YARN application state monitor] INFO org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend - Interrupting monitor thread
2024-03-19 07:06:32,016 [main] INFO org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend - Shutting down all executors
2024-03-19 07:06:32,016 [dispatcher-CoarseGrainedScheduler] INFO org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint - Asking each executor to shut down
2024-03-19 07:06:32,022 [main] INFO org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend - YARN client scheduler backend Stopped
2024-03-19 07:06:32,043 [dispatcher-event-loop-9] INFO org.apache.spark.MapOutputTrackerMasterEndpoint - MapOutputTrackerMasterEndpoint stopped!
2024-03-19 07:06:32,063 [main] INFO org.apache.spark.storage.memory.MemoryStore - MemoryStore cleared
2024-03-19 07:06:32,064 [main] INFO org.apache.spark.storage.BlockManager - BlockManager stopped
2024-03-19 07:06:32,077 [main] INFO org.apache.spark.storage.BlockManagerMaster - BlockManagerMaster stopped
2024-03-19 07:06:32,083 [dispatcher-event-loop-15] INFO org.apache.spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint - OutputCommitCoordinator stopped!
2024-03-19 07:06:32,093 [main] INFO org.apache.spark.SparkContext - Successfully stopped SparkContext
<<< Invocation of Spark command completed <<<
Hadoop Job IDs executed by Spark: job_1710272981670_0506
<<< Invocation of Main class completed <<<
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://ip-<ip>:8020/user/hadoop/oozie-oozi/0000204-240312195311040-oozie-oozi-W/SnowflakeIntegration--spark/action-data.seq
2024-03-19 07:06:32,152 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new compressor [.deflate]
Stopping AM
2024-03-19 07:06:32,188 [main] INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Waiting for application to be successfully unregistered.
Callback notification attempts left 0
Callback notification trying http://ip-<ip>.ec2.internal:11000/oozie/callback?id=0000204-240312195311040-oozie-oozi-W@SnowflakeIntegration&status=SUCCEEDED
Callback notification to http://ip-<ip>.ec2.internal:11000/oozie/callback?id=0000204-240312195311040-oozie-oozi-W@SnowflakeIntegration&status=SUCCEEDED succeeded
Callback notification succeeded
2024-03-19 07:06:32,972 [shutdown-hook-0] INFO org.apache.spark.util.ShutdownHookManager - Shutdown hook called
2024-03-19 07:06:32,973 [shutdown-hook-0] INFO org.apache.spark.util.ShutdownHookManager - Deleting directory /mnt/yarn/usercache/hadoop/appcache/application_1710272981670_0505/spark-0072b5a6-b8f5-4ed2-9fbf-b295bd878711
2024-03-19 07:06:32,977 [shutdown-hook-0] INFO org.apache.spark.util.ShutdownHookManager - Deleting directory /mnt/tmp/spark-a12c27e8-ec49-4e75-a8f9-2355693611a2
End of LogType:stdout
***********************************************************************
Container: container_1710272981670_0505_01_000001 on ip-10-13-25-16.ec2.internal_8041
LogAggregationType: AGGREGATED
=====================================================================================
LogType:syslog
LogLastModifiedTime:Tue Mar 19 07:06:33 +0000 2024
LogLength:700
LogContents:
2024-03-19 06:51:08,166 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-03-19 06:51:08,487 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at ip-<ip>.ec2.internal/<ip>:8030
2024-03-19 06:51:08,734 INFO [main] org.apache.hadoop.conf.Configuration: resource-types.xml not found
2024-03-19 06:51:08,735 INFO [main] org.apache.hadoop.yarn.util.resource.ResourceUtils: Unable to find 'resource-types.xml'.
2024-03-19 06:51:09,578 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at ip-<ip>/10.13.25.58:8032
End of LogType:syslog
*********************************************************************** You can see in the logs that "Stopping Spark..." was printed, which means that the job has executed until the last step. The logs also mentions that spark was shutdown. The logs end here. But the oozie workflow is still in RUNNING state. Why is this happening? How can I fix this?
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Spark
03-13-2024
05:08 AM
1 Kudo
I am trying to kill an existing coordinator job and re-deploy it on oozie. I am running the command - oozie job -run --config siftcuration.properties. Here is my oozie properties file - nameNode=hdfs://ip-<ip>:8020 jobTracker=ip-<ip>:8032 master=yarn mode=client queueName=default #oozie.libpath=${nameNode}/user/oozie/share/lib oozie.use.system.libpath=true oozie.wf.rerun.failnodes=true runtimeEnvironment=aws_prod appName=SiftCuration appBaseDir=${nameNode}/user/${user.name}/oozieJobs/${appName} oozie.coord.application.path=${appBaseDir}/workflow/coordinator.xml etl.OverwriteOrAppend=Overwrite spark.driver.memory=4g spark.executor.memory=4g spark.num.executors=3 spark.executor.cores=1 etl.default.date.format=yyyy-MM-dd etl.run.date=yesterday coordinatorKickDate=2024-03-14 When I run this command, it gives me a coordinator job ID. But when I check the coordinator status, I see that is is still in PREP state. It has been in PREP state for over 4 hours. 0000002-240312195311040-oozie-oozi-C SiftCuration PREP 1 DAY 2024-03-14 03:00 GMT I tried checking the logs and I am seeing this issue - 2024-03-13 11:59:34,030 WARN KillXCommand:523 - SERVER[ip-<ip>.ec2.internal] USER[hadoop] GROUP[-] TOKEN[] APP[SiftCuration] JOB[0043858-220322202819429-oozie-oozi-W] ACTION[] E0725: Workflow instance can not be killed, 0043858-220322202819429-oozie-oozi-W, Error Code: E0725 This workflow ID belongs to the previous coordinator job that was already killed. On checking the yarn logs for this workflow, I see this Actions ------------------------------------------------------------------------------------------------------------------------------------ ID Status Ext ID Ext Status Err Code ------------------------------------------------------------------------------------------------------------------------------------ 0043858-220322202819429-oozie-oozi-W@:start: OK - OK - ------------------------------------------------------------------------------------------------------------------------------------ 0043858-220322202819429-oozie-oozi-W@SnowflakeJobStartStoredProcedure OK application_1647980654756_172599SUCCEEDED - ------------------------------------------------------------------------------------------------------------------------------------ 0043858-220322202819429-oozie-oozi-W@ParameterGenerator OK application_1647980654756_172603SUCCEEDED - ------------------------------------------------------------------------------------------------------------------------------------ 0043858-220322202819429-oozie-oozi-W@SiftCurationSparkETL KILLED application_1647980654756_172607RUNNING JA009 ------------------------------------------------------------------------------------------------------------------------------------ It says application_1647980654756_172607 is running but I cannot find this application ID when I do a yarn logs -applicationId application_1647980654756_172607. Could not locate application logs for application_1647980654756_172607 How do I fix this issue?
... View more
Labels:
- Labels:
-
Apache Oozie