Created on 11-18-2014 08:06 AM - edited 09-16-2022 02:13 AM
Hello,
I'm using the quickstart cloudera and I'm having trouble with readining files. (my Java<RDD>/ is empty)
(I'm getting this error if I try to save/print the JavaRDD)
ERROR JobScheduler: Error running job streaming job 1416325882000 ms.0 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 41.0 failed 4 times, most recent failure: Lost task 0.3 in stage 41.0 (TID 167, quickstart.cloudera): java.io.IOException: unexpected exception type java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538) java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1025) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) scala.collection.immutable.$colon$colon.readObject(List.scala:362) sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:606) java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
I defined my context as:
SparkConf sparkConf = new SparkConf() .setMaster(MASTER) .setAppName("BigData") .setSparkHome(SPARK_HOME) .setJars(new String[]{JARS}); sc = new JavaSparkContext(sparkConf);
Then I read the Java<RDD>
File folder = new File(inputFile); File[] listOfFiles = folder.listFiles(); Queue<JavaRDD<String>> inputRDDQueue = new LinkedList<JavaRDD<String>>(); if(listOfFiles!=null){ for (File file : listOfFiles) { if (file.isFile()) { System.out.println(file.getName()); inputRDDQueue.add( MyJavaSparkContext.sc.textFile(inputFile+file.getName()) ); } }
and then I try to put all this in a queue and print it:(here I get the error) => printing
System.out.println(inputRDDQueue.toString()); JavaDStream<String> input = MyJavaStreamingContex.ssc.queueStream(inputRDDQueue); input.dstream().persist().print();
and then I start the spark context
MyJavaSparkContext.sc.startTime();
Could you help me?
Thank you!
Alina GHERMAN
Created 11-19-2014 12:10 AM
I'm not suggesting you log in as spark or a superuser. You shouldn't do this. Instead, change your app to not access directories you don't have access to as your user.
Created 11-18-2014 08:42 AM
How are you executing this? it sounds like you may not be using spark-submit, or, are accidentally bundling Spark (perhaps a slightly different version) into your app. Spark deps should be 'provided' in your build and you'll want to use spark-submit to submit. You don't set master in your SparkConf in code.
Created 11-18-2014 09:19 AM
Hello,
I created a maven project and and I4m deploying it with eclipse.
In my pom I puted
<dependency><!-- spark --> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.1.0</version> </dependency> <dependency><!-- spark --> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.10</artifactId> <version>1.1.0</version> </dependency>
in cloudera quickstart the version is 3.6 if I understood well.
I will try with spark submit right now!
Thank you!
Alina
Created 11-18-2014 09:22 AM
You need <scope>provided</scope> as well.
Created 11-18-2014 10:12 AM
Thank you!
I think I'm having also some others side problems because when I export the jar from eclipse and run it with java -jar myjar.jar I get a spark error
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/api/java/function/Function2 at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2570) at java.lang.Class.getMethod0(Class.java:2813) at java.lang.Class.getMethod(Class.java:1663) at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) Caused by: java.lang.ClassNotFoundException: org.apache.spark.api.java.function.Function2 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
and when I try to run it with spark submit:
sudo spark-submit /home/cloudera/Desktop/test.jar --class com.seb.standard_self.App --verbose Error: Cannot load main class from JAR: file:/home/cloudera/Desktop/test.jar Run with --help for usage help or --verbose for debug output
The jar is containing a manifest with 2 lines:
However the Manifest-Version: 1.0 Main-Class: com.seb.standard_self.App
and is also containing my main class..
This is strange because when I run it with eclipse I don't have any error like this..
Note: I added the scope in the spark and hadoop artifacts.
Thank you!
Created 11-18-2014 10:52 AM
Yes, you shouldn't be able to run this as a stand-alone app.
Hm, try putting the jar file last? that is how the script says to do it.
Created 11-18-2014 01:05 PM
In fact the generated jar wasn't ok(I fixed this in my pom.xml)
<build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>2.4</version> <configuration> <archive> <manifest> <mainClass>com.seb.standard_self.App</mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </plugin> </plugins> </build> </project>
When I run my jar with the spark-submit I get another error (wrights) (still not the error that I get on my eclipse)
INFO Utils: Successfully started service 'HTTP file server' on port 41178. 14/11/18 13:02:43 INFO Utils: Successfully started service 'SparkUI' on port 4040. 14/11/18 13:02:43 INFO SparkUI: Started SparkUI at http://10.0.2.15:4040 14/11/18 13:02:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=EXECUTE, inode="/user/spark":spark:spark:drwxr-x--- at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:255) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:236) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:178) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137)
Thank you!
Alina
Created 11-18-2014 01:07 PM
It means basically what it says, that you're writing some program that accesses /user/spark but you're not running as spark, the user that can access that directory.
Created 11-18-2014 01:55 PM
I tyed to change the user into spark but I don't know the password. I tried cloudera and spark but it didn't work.
Then I changed into superuser and in superuser I have another error
./spark-submit --class com.seb.standard_self.App --master "spark://quickstart.cloudera:7077" /home/cloudera/workspace/standard-to-self-explicit/target/standard-self-0.0.1-SNAPSHOT.jar Exception in thread "main" java.lang.NoClassDefFoundError: org.apache.spark.deploy.SparkSubmit at gnu.java.lang.MainThread.run(libgcj.so.10) Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.SparkSubmit not found in gnu.gcj.runtime.SystemClassLoader{urls=[file:./,file:/usr/lib/spark/conf/,file:/etc/hadoop/conf/,file:/etc/hadoop/conf/,file:/usr/lib/hadoop/../hadoop-hdfs/./], parent=gnu.gcj.runtime.ExtensionClassLoader{urls=[], parent=null}} at java.net.URLClassLoader.findClass(libgcj.so.10) at java.lang.ClassLoader.loadClass(libgcj.so.10) at java.lang.ClassLoader.loadClass(libgcj.so.10) at gnu.java.lang.MainThread.run(libgcj.so.10)
...
Thank you!
Created 11-19-2014 12:10 AM
I'm not suggesting you log in as spark or a superuser. You shouldn't do this. Instead, change your app to not access directories you don't have access to as your user.