Reply
New Contributor
Posts: 4
Registered: ‎09-25-2015

Spark : File not found error .... works fine in local mode but failed in cluster mode

Hi every one ,

                        I have simple spark application in which I have few spring context and rule xml file . all these files are part of the projects and located under resource folder (reource\db\rule\rule2.xml) and its working fine in spark local mode . when I ran the same application in yarn cluster mode , its complaining that the file rule2.xml not found and its part of Maven built jar. do I need to make any changes for the application to work in cluster mode ? any help would be appreciated

 

Here is the code in which I am reading the xml file

 

JaxbUtils.unmarshalRule(
ByteStreams.toByteArray(
Resources.getResource(String.format("db/rule/rule2.xml", id)).openStream()));

 

Here is the error log

 

/24 15:57:07 INFO storage.BlockManager: Registering executor with local external shuffle service.
15/09/24 15:57:07 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@bdaolc011node08.sabre.com:40589/user/HeartbeatReceiver
15/09/24 15:57:09 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 0
15/09/24 15:57:09 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
15/09/24 15:57:09 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 0
15/09/24 15:57:09 INFO storage.MemoryStore: ensureFreeSpace(3132) called with curMem=0, maxMem=555755765
15/09/24 15:57:09 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.1 KB, free 530.0 MB)
15/09/24 15:57:09 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
15/09/24 15:57:09 INFO broadcast.TorrentBroadcast: Reading broadcast variable 0 took 134 ms
15/09/24 15:57:09 INFO storage.MemoryStore: ensureFreeSpace(6144) called with curMem=3132, maxMem=555755765
15/09/24 15:57:09 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 6.0 KB, free 530.0 MB)
15/09/24 15:57:12 INFO support.ClassPathXmlApplicationContext: Refreshing org.springframework.context.support.ClassPathXmlApplicationContext@3c6db742: startup date [Thu Sep 24 15:57:12 CDT 2015]; root of context hierarchy
15/09/24 15:57:12 INFO xml.XmlBeanDefinitionReader: Loading XML bean definitions from class path resource [spring/rules-engine-spring.xml]
15/09/24 15:57:13 INFO xml.XmlBeanDefinitionReader: Loading XML bean definitions from class path resource [spring/ere-spring.xml]
15/09/24 15:57:13 INFO support.DefaultListableBeanFactory: Overriding bean definition for bean 'nativeRuleBuilder': replacing [Generic bean: class [com.sabre.sp.ere.core.loader.DroolsNativeRuleBuilder]; scope=; abstract=false; lazyInit=false; autowireMode=0; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=null; factoryMethodName=null; initMethodName=null; destroyMethodName=null; defined in class path resource [spring/ere-spring.xml]] with [Generic bean: class [com.sabre.sp.ere.core.loader.DroolsNativeRuleBuilder]; scope=; abstract=false; lazyInit=false; autowireMode=0; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=null; factoryMethodName=null; initMethodName=null; destroyMethodName=null; defined in class path resource [spring/rules-engine-spring.xml]]
15/09/24 15:57:13 INFO support.DefaultListableBeanFactory: Overriding bean definition for bean 'rulesExecutor': replacing [Generic bean: class [com.sabre.sp.ere.core.executor.DroolsRulesExecutor]; scope=; abstract=false; lazyInit=false; autowireMode=0; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=null; factoryMethodName=null; initMethodName=null; destroyMethodName=null; defined in class path resource [spring/ere-spring.xml]] with [Generic bean: class [com.sabre.sp.ere.core.executor.DroolsRulesExecutor]; scope=; abstract=false; lazyInit=false; autowireMode=0; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=null; factoryMethodName=null; initMethodName=null; destroyMethodName=null; defined in class path resource [spring/rules-engine-spring.xml]]
15/09/24 15:57:13 INFO support.PropertySourcesPlaceholderConfigurer: Loading properties file from class path resource [spring/ere-test.properties]
15/09/24 15:57:13 WARN support.PropertySourcesPlaceholderConfigurer: Could not load properties from class path resource [spring/ere-test.properties]: class path resource [spring/ere-test.properties] cannot be opened because it does not exist
15/09/24 15:57:13 INFO support.PropertySourcesPlaceholderConfigurer: Loading properties file from class path resource [spring/ere-spring.properties]
15/09/24 15:57:13 INFO annotation.AutowiredAnnotationBeanPostProcessor: JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
15/09/24 15:57:13 INFO jdbc.JDBCRDD: closed connection
java.lang.IllegalArgumentException: resource spring/rule2.xml not found.
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)
at com.google.common.io.Resources.getResource(Resources.java:152)
at com.sabre.rules.AppRuleExecutor.rule(AppRuleExecutor.java:50)
at com.sabre.rules.AppRuleExecutor.executeRules(AppRuleExecutor.java:39)
at com.sabre.rules.RuleComponent.executeRules(RuleComponent.java:43)
at com.sabre.rules.SMAAlertImpl$1.call(SMAAlertImpl.java:60)
at com.sabre.rules.SMAAlertImpl$1.call(SMAAlertImpl.java:37)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:143)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:143)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Spark : File not found error .... works fine in local mode but failed in cluster mode

The relationship of .jars and classloaders may not be the same as in
local mode, such that this may not work as expected. Instead of
depending on this, consider either distributing your file via HDFS, or
using the --files option with Spark to distribute files to local disk:
http://spark.apache.org/docs/latest/running-on-yarn.html

New Contributor
Posts: 4
Registered: ‎09-25-2015

Re: Spark : File not found error .... works fine in local mode but failed in cluster mode

Thank you for your response. I used --files and still getting the same error. I copied the rule2.xml to the folder from where I am running Maven built jar but still my application code is pointing to /db/rule/rule2.xml . I ran the spark application as below 

 

spark-submit --jars vertica-jdbc-7.1.1-3.jar --files rule2.xml#rule2.xml --class "com.sabre.rules.SMAAlertImpl" --master yarn-cluster simple-project-1.0-shaded.jar

 

Running the Maven built jar from /home/../SMA/ and this folder has following files 

 

simple-project-1.0-shaded.jar

rule2.xml

vertica-jdbc-7.1.1-3.jar

 

Here is my application code

 

JaxbUtils.unmarshalRule(
ByteStreams.toByteArray(
Resources.getResource(String.format("/db/rule/rule%d.xml", id)).openStream())

 

Do I need to change my application code ? why am I still getting the error ? let me know if I miss anything ?

 

 

New Contributor
Posts: 4
Registered: ‎09-25-2015

Re: Spark : File not found error .... works fine in local mode but failed in cluster mode

any update to my issue ? I am kind of stuck . it might be simple fix but since I am very new to Spark .. dont know how to fix this issue . Please let me know 

 

I even copied the file to hsfs location /db/rule/rule2.xml and  tried the below but didnt work. rule2.xml is part of the project and with the Maven built jar its available . to make it work do I need to change my application code between local mode and cluster mode ? 

 

JaxbUtils.unmarshalRule(
ByteStreams.toByteArray(
Resources.getResource(String.format("/db/rule/rule%d.xml", id)).openStream())
JaxbUtils.unmarshalRule(
ByteStreams.toByteArray(
Resources.getResource(String.format("file:///db/rule/rule%d.xml", id)).openStream())

 

JaxbUtils.unmarshalRule(
ByteStreams.toByteArray(
Resources.getResource(String.format("hdfs:///db/rule/rule%d.xml", id)).openStream())

 

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Spark : File not found error .... works fine in local mode but failed in cluster mode

You won't be able to read a local file with this code. You are still trying
to read from the classpath. I mean this would also have to change to read a
file locally.

New Contributor
Posts: 4
Registered: ‎09-25-2015

Re: Spark : File not found error .... works fine in local mode but failed in cluster mode

would you mind telling me what needs to be changed in my code ? as I mentioned local mode works file when I tried to read file like this 

file:///db/rule/rule2.xml

What am i supposed to change in the above code to make it work in cluster mode.

 

Thanks in advance . 

Highlighted
Cloudera Employee
Posts: 322
Registered: ‎01-16-2014

Re: Spark : File not found error .... works fine in local mode but failed in cluster mode

With the --files option you put the file in your working directory on the executor.

You are trying to point to the file using an absolute path which is not what files option does for you. Can you use just the name "rule2.xml" and not a path.

When you read the documentation for the files. See the important note at the bottom of the page running on yarn.

 

Also do not use the Resources.getResource() but just use a open of a java construct like: new FileInputStream("rule2.xml") or something like it.

 

Wilfred