Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Problems with cloudera quickstart/spark/reading

avatar
Champion Alumni

Hello,

 

I'm using the quickstart cloudera and I'm having trouble with readining files. (my  Java<RDD>/  is empty)

(I'm getting this error if I try to save/print the JavaRDD)

 

ERROR JobScheduler: Error running job streaming job 1416325882000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 41.0 failed 4 times, most recent failure: Lost task 0.3 in stage 41.0 (TID 167, quickstart.cloudera): java.io.IOException: unexpected exception type
        java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538)
        java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1025)
        java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
        sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        java.lang.reflect.Method.invoke(Method.java:606)
        java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
        java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

 

I defined my context as:

		    	   	SparkConf sparkConf = new SparkConf()
		        	.setMaster(MASTER)
		        	.setAppName("BigData")
		        	.setSparkHome(SPARK_HOME)
		        	.setJars(new String[]{JARS});
		    	   	sc = new JavaSparkContext(sparkConf);  

 Then I read the Java<RDD>

     File folder = new File(inputFile);
        File[] listOfFiles = folder.listFiles();
        Queue<JavaRDD<String>> inputRDDQueue = new LinkedList<JavaRDD<String>>();
        if(listOfFiles!=null){
	        for (File file : listOfFiles) {
	            if (file.isFile()) {
	                System.out.println(file.getName());
	                inputRDDQueue.add(
	                		MyJavaSparkContext.sc.textFile(inputFile+file.getName())
	                		);
	            }
	        }

 and then I try to put all this in a queue and print it:(here I get the error) => printing

	        System.out.println(inputRDDQueue.toString());
	        JavaDStream<String> input = MyJavaStreamingContex.ssc.queueStream(inputRDDQueue);
	        input.dstream().persist().print();

 and then I start the spark context

MyJavaSparkContext.sc.startTime();

 

Could you help me?

 

Thank you!

Alina GHERMAN

 

 

GHERMAN Alina
1 ACCEPTED SOLUTION

avatar
Master Collaborator

I'm not suggesting you log in as spark or a superuser. You shouldn't do this. Instead, change your app to not access directories you don't have access to as your user.

View solution in original post

9 REPLIES 9

avatar
Master Collaborator

How are you executing this? it sounds like you may not be using spark-submit, or, are accidentally bundling Spark (perhaps a slightly different version) into your app. Spark deps should be 'provided' in your build and you'll want to use spark-submit to submit. You don't set master in your SparkConf in code.

avatar
Champion Alumni

Hello,

 

I created a maven project and and I4m deploying it with eclipse. 
In my pom I puted 

	
		<dependency><!-- spark -->
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-core_2.10</artifactId>
			<version>1.1.0</version>
		</dependency>
		
	    <dependency><!-- spark -->
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-streaming_2.10</artifactId>
			<version>1.1.0</version>
	   </dependency>

 

in cloudera quickstart the version is 3.6 if I understood well.

 

I will try with spark submit right now!

Thank you!

 

Alina

 

 

GHERMAN Alina

avatar
Master Collaborator

You need <scope>provided</scope> as well.

avatar
Champion Alumni

Thank you!

 

I think I'm having also some others side problems because when I export the jar from eclipse and run it with java -jar myjar.jar I get a spark error


Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/api/java/function/Function2
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2570)
	at java.lang.Class.getMethod0(Class.java:2813)
	at java.lang.Class.getMethod(Class.java:1663)
	at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.api.java.function.Function2
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

 and when I try to run it with spark submit:

 

sudo spark-submit /home/cloudera/Desktop/test.jar --class com.seb.standard_self.App --verbose
Error: Cannot load main class from JAR: file:/home/cloudera/Desktop/test.jar
Run with --help for usage help or --verbose for debug output

 

 The jar is containing a manifest with 2 lines:

However the Manifest-Version: 1.0

Main-Class: com.seb.standard_self.App

 and is also containing my main class..

 

This is strange because when I run it with eclipse I don't have any error like this..


Note: I added the scope in the spark and hadoop artifacts.

Thank you!

GHERMAN Alina

avatar
Master Collaborator

Yes, you shouldn't be able to run this as a stand-alone app.

Hm, try putting the jar file last? that is how the script says to do it.

avatar
Champion Alumni

In fact the generated jar wasn't ok(I fixed this in my pom.xml)

<build>
<plugins>     
	<plugin>
	  <groupId>org.apache.maven.plugins</groupId>
	  <artifactId>maven-assembly-plugin</artifactId>
	  <version>2.4</version>
	  <configuration>
	    <archive>
	      <manifest>
	        <mainClass>com.seb.standard_self.App</mainClass>
	      </manifest>
	    </archive>
	    <descriptorRefs>
	      <descriptorRef>jar-with-dependencies</descriptorRef>
	    </descriptorRefs>
	  </configuration>
	</plugin>
	</plugins> 
	</build> 
</project>

 When I run my jar with the spark-submit I get another error (wrights) (still not the error that I get on my eclipse)

INFO Utils: Successfully started service 'HTTP file server' on port 41178.
14/11/18 13:02:43 INFO Utils: Successfully started service 'SparkUI' on port 4040.
14/11/18 13:02:43 INFO SparkUI: Started SparkUI at http://10.0.2.15:4040
14/11/18 13:02:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=EXECUTE, inode="/user/spark":spark:spark:drwxr-x---
	at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:255)
	at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:236)
	at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:178)
	at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137)

 Thank you!

 

Alina

GHERMAN Alina

avatar
Master Collaborator

It means basically what it says, that you're writing some program that accesses /user/spark but you're not running as spark, the user that can access that directory.

avatar
Champion Alumni

I tyed to change the user into spark but I don't know the password. I tried cloudera and spark but it didn't work.
Then I changed into superuser and in superuser I have another error

 ./spark-submit --class com.seb.standard_self.App --master "spark://quickstart.cloudera:7077" /home/cloudera/workspace/standard-to-self-explicit/target/standard-self-0.0.1-SNAPSHOT.jar
Exception in thread "main" java.lang.NoClassDefFoundError: org.apache.spark.deploy.SparkSubmit
   at gnu.java.lang.MainThread.run(libgcj.so.10)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.SparkSubmit not found in gnu.gcj.runtime.SystemClassLoader{urls=[file:./,file:/usr/lib/spark/conf/,file:/etc/hadoop/conf/,file:/etc/hadoop/conf/,file:/usr/lib/hadoop/../hadoop-hdfs/./], parent=gnu.gcj.runtime.ExtensionClassLoader{urls=[], parent=null}}
   at java.net.URLClassLoader.findClass(libgcj.so.10)
   at java.lang.ClassLoader.loadClass(libgcj.so.10)
   at java.lang.ClassLoader.loadClass(libgcj.so.10)
   at gnu.java.lang.MainThread.run(libgcj.so.10)

...

 

Thank you!

 

GHERMAN Alina

avatar
Master Collaborator

I'm not suggesting you log in as spark or a superuser. You shouldn't do this. Instead, change your app to not access directories you don't have access to as your user.