Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: The Cloudera Community will undergo maintenance on Saturday, August 17 at 12:00am PDT. See more info here.

Can backups (fetchImage + mysqldump) crash/kill Spark processes?

Highlighted

Can backups (fetchImage + mysqldump) crash/kill Spark processes?

Explorer

Last night we had a weird situation.

 

One of Spark processes ended 3 minutes after the backup job started. 

 

That backup just has a simple mysqldump to get all the metadata, followed by a fetchImage from the HDFS.

 

My question is... is it possible... that a specific Spark job which was running correctly for a few hours, was ended because the backup process started?

 

This spark job is only doing an access to the HDFS (said by the development team...) so... could it be that the fetchImage is killing something or... signaling something to stop reading from the HDFS?

 

I'm kind of confused at this moment... this is why I'm asking the question here. 

 

Our cluster is super stable at this point in time, this never happened before. The only thing weird at this point the actual time and day of the backup which is the same as the crashing Spark job. Like... 1+1 = 2... 

 

Could it be something else?

 

 

2019-07-21 21:46:26 INFO  ContainerManagementProtocolProxy:260 - Opening proxy : "NODE1 :)":8041
2019-07-22 00:33:35 INFO  YarnAllocator:54 - Completed container container_e16_1562587047011_1317_01_000013 on host: "NODE 5 =)" (state: COMPLETE, exit status: 1)
2019-07-22 00:33:35 WARN  YarnAllocator:66 - Container marked as failed: container_e16_1562587047011_1317_01_000013 on host: "NODE 5 =)". Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_e16_1562587047011_1317_01_000013
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
	at org.apache.hadoop.util.Shell.run(Shell.java:507)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:399)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Shell output: main : command provided 1
main : run as user is XXXXX
main : requested yarn user is XXXXX
Writing to tmp file /u11/hadoop/yarn/nm/nmPrivate/application_1562587047011_1317/container_e16_1562587047011_1317_01_000013/container_e16_1562587047011_1317_01_000013.pid.tmp
Writing to cgroup task files...

 

 

 

Thank you.