Reply
Champion Alumni
Posts: 196
Registered: ‎11-18-2014
Accepted Solution

Problems with cloudera quickstart/spark/reading

Hello,

 

I'm using the quickstart cloudera and I'm having trouble with readining files. (my  Java<RDD>/  is empty)

(I'm getting this error if I try to save/print the JavaRDD)

 

ERROR JobScheduler: Error running job streaming job 1416325882000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 41.0 failed 4 times, most recent failure: Lost task 0.3 in stage 41.0 (TID 167, quickstart.cloudera): java.io.IOException: unexpected exception type
        java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538)
        java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1025)
        java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
        sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        java.lang.reflect.Method.invoke(Method.java:606)
        java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
        java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

 

I defined my context as:

		    	   	SparkConf sparkConf = new SparkConf()
		        	.setMaster(MASTER)
		        	.setAppName("BigData")
		        	.setSparkHome(SPARK_HOME)
		        	.setJars(new String[]{JARS});
		    	   	sc = new JavaSparkContext(sparkConf);  

 Then I read the Java<RDD>

     File folder = new File(inputFile);
        File[] listOfFiles = folder.listFiles();
        Queue<JavaRDD<String>> inputRDDQueue = new LinkedList<JavaRDD<String>>();
        if(listOfFiles!=null){
	        for (File file : listOfFiles) {
	            if (file.isFile()) {
	                System.out.println(file.getName());
	                inputRDDQueue.add(
	                		MyJavaSparkContext.sc.textFile(inputFile+file.getName())
	                		);
	            }
	        }

 and then I try to put all this in a queue and print it:(here I get the error) => printing

	        System.out.println(inputRDDQueue.toString());
	        JavaDStream<String> input = MyJavaStreamingContex.ssc.queueStream(inputRDDQueue);
	        input.dstream().persist().print();

 and then I start the spark context

MyJavaSparkContext.sc.startTime();

 

Could you help me?

 

Thank you!

Alina GHERMAN

 

 

GHERMAN Alina
Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Problems with cloudera quickstart/spark/reading

How are you executing this? it sounds like you may not be using spark-submit, or, are accidentally bundling Spark (perhaps a slightly different version) into your app. Spark deps should be 'provided' in your build and you'll want to use spark-submit to submit. You don't set master in your SparkConf in code.

Champion Alumni
Posts: 196
Registered: ‎11-18-2014

Re: Problems with cloudera quickstart/spark/reading

Hello,

 

I created a maven project and and I4m deploying it with eclipse. 
In my pom I puted 

	
		<dependency><!-- spark -->
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-core_2.10</artifactId>
			<version>1.1.0</version>
		</dependency>
		
	    <dependency><!-- spark -->
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-streaming_2.10</artifactId>
			<version>1.1.0</version>
	   </dependency>

 

in cloudera quickstart the version is 3.6 if I understood well.

 

I will try with spark submit right now!

Thank you!

 

Alina

 

 

GHERMAN Alina
Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Problems with cloudera quickstart/spark/reading

You need <scope>provided</scope> as well.

Champion Alumni
Posts: 196
Registered: ‎11-18-2014

Re: Problems with cloudera quickstart/spark/reading

Thank you!

 

I think I'm having also some others side problems because when I export the jar from eclipse and run it with java -jar myjar.jar I get a spark error


Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/api/java/function/Function2
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2570)
	at java.lang.Class.getMethod0(Class.java:2813)
	at java.lang.Class.getMethod(Class.java:1663)
	at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.api.java.function.Function2
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

 and when I try to run it with spark submit:

 

sudo spark-submit /home/cloudera/Desktop/test.jar --class com.seb.standard_self.App --verbose
Error: Cannot load main class from JAR: file:/home/cloudera/Desktop/test.jar
Run with --help for usage help or --verbose for debug output

 

 The jar is containing a manifest with 2 lines:

However the Manifest-Version: 1.0

Main-Class: com.seb.standard_self.App

 and is also containing my main class..

 

This is strange because when I run it with eclipse I don't have any error like this..


Note: I added the scope in the spark and hadoop artifacts.

Thank you!

GHERMAN Alina
Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Problems with cloudera quickstart/spark/reading

Yes, you shouldn't be able to run this as a stand-alone app.

Hm, try putting the jar file last? that is how the script says to do it.

Champion Alumni
Posts: 196
Registered: ‎11-18-2014

Re: Problems with cloudera quickstart/spark/reading

In fact the generated jar wasn't ok(I fixed this in my pom.xml)

<build>
<plugins>     
	<plugin>
	  <groupId>org.apache.maven.plugins</groupId>
	  <artifactId>maven-assembly-plugin</artifactId>
	  <version>2.4</version>
	  <configuration>
	    <archive>
	      <manifest>
	        <mainClass>com.seb.standard_self.App</mainClass>
	      </manifest>
	    </archive>
	    <descriptorRefs>
	      <descriptorRef>jar-with-dependencies</descriptorRef>
	    </descriptorRefs>
	  </configuration>
	</plugin>
	</plugins> 
	</build> 
</project>

 When I run my jar with the spark-submit I get another error (wrights) (still not the error that I get on my eclipse)

INFO Utils: Successfully started service 'HTTP file server' on port 41178.
14/11/18 13:02:43 INFO Utils: Successfully started service 'SparkUI' on port 4040.
14/11/18 13:02:43 INFO SparkUI: Started SparkUI at http://10.0.2.15:4040
14/11/18 13:02:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=EXECUTE, inode="/user/spark":spark:spark:drwxr-x---
	at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:255)
	at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:236)
	at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:178)
	at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137)

 Thank you!

 

Alina

GHERMAN Alina
Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Problems with cloudera quickstart/spark/reading

It means basically what it says, that you're writing some program that accesses /user/spark but you're not running as spark, the user that can access that directory.

Champion Alumni
Posts: 196
Registered: ‎11-18-2014

Re: Problems with cloudera quickstart/spark/reading

I tyed to change the user into spark but I don't know the password. I tried cloudera and spark but it didn't work.
Then I changed into superuser and in superuser I have another error

 ./spark-submit --class com.seb.standard_self.App --master "spark://quickstart.cloudera:7077" /home/cloudera/workspace/standard-to-self-explicit/target/standard-self-0.0.1-SNAPSHOT.jar
Exception in thread "main" java.lang.NoClassDefFoundError: org.apache.spark.deploy.SparkSubmit
   at gnu.java.lang.MainThread.run(libgcj.so.10)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.SparkSubmit not found in gnu.gcj.runtime.SystemClassLoader{urls=[file:./,file:/usr/lib/spark/conf/,file:/etc/hadoop/conf/,file:/etc/hadoop/conf/,file:/usr/lib/hadoop/../hadoop-hdfs/./], parent=gnu.gcj.runtime.ExtensionClassLoader{urls=[], parent=null}}
   at java.net.URLClassLoader.findClass(libgcj.so.10)
   at java.lang.ClassLoader.loadClass(libgcj.so.10)
   at java.lang.ClassLoader.loadClass(libgcj.so.10)
   at gnu.java.lang.MainThread.run(libgcj.so.10)

...

 

Thank you!

 

GHERMAN Alina
Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: Problems with cloudera quickstart/spark/reading

I'm not suggesting you log in as spark or a superuser. You shouldn't do this. Instead, change your app to not access directories you don't have access to as your user.

Announcements