About Intern9

Intern9 · ‎11-06-2014

Including the all the jars worked like a charm. Regarding the "64MB limit", I wasn't able to upload the uber-jar to HDFS via Hue (Error:Undefined) and from some searches I saw people claiming that they thought it was a size issue. Thanks!

Intern9 · ‎11-05-2014

I have checked out some message boards and found we should add jars to the class path if we get errors of this nature and I have attempted to do that but I'm not having any luck. below are my attempts to add the jar to the classpath which I think were succesfull [hdfs@cloudera01 root]$ ls zookeeper-3.4.5-cdh5.1.3.jar [hdfs@cloudera01 root]$ pwd /root [hdfs@cloudera01 root]$ export SPARK_SUBMIT_CLASSPATH=./commons-codec-1.4.jar:$SPARK_HOME/assembly/lib/*:./root/zookeeper-3.4.5-cdh5.1.3.jar [hdfs@cloudera01 root]$ echo $SPARK_SUBMIT_CLASSPATH ./commons-codec-1.4.jar:/assembly/lib/*:./root/zookeeper-3.4.5-cdh5.1.3.jar Does anyone have any ideas or notice if my commands are off? Links I used to try these operations are below, thanks for any thoughts/ideas http://apache-spark-user-list.1001560.n3.nabble.com/org-I0Itec-zkclient-serialize-ZkSerializer-ClassNotFound-td15919.html http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_running_crunch_with_spark.html I believe that in the Cloudera article it is saying to include the app you are running with the missing dependency included within that jar in the classpath rather than just the missing jar but I have been having a difficult time gettting Maven to doexactly what I would like it to do. I can get it to include the whole dependency list of my app but then Hue/something-with-my-setup won't let me upload the jar because I belive there is an issue with uploading files greater than 64MB just going off Google search results. I don't think I would need to include my app with the dependency for it to work though if that was the root of the problem.

Intern9 · ‎10-31-2014

Thanks!

Intern9 · ‎10-31-2014

I have a job that calculates some statistics for a short rolling window time period and would like to be able to dump all the data into HDFS. I have come to learn HDFS does not support appends. Attempting to set my Spark app to make a new directory and and write to a new file for every RDD is not viable. After searching around I found an Avro object DataFileWriter which looks like it would work but according to the Spark user group message referenced below the object won't seriealize so it won't make it out to the worker nodes. I have read that SparkSQL can consume from Kafka and then write to a parquet file which seems like it would solve my problem but Cloudera does not include SparkSQL. Would it be out of the question to try to get SparkSQL and have it write to my CDH HDFS? I don't think I would be able to hook those two up. Does anyone know of possible solutions to the problem I have? http://apache-spark-user-list.1001560.n3.nabble.com/Persisting-Avro-files-from-Spark-streaming-td1094.html

Intern9 · ‎10-29-2014

I have been trying to find some Spark Streaming class or config to enable me to present permission credentials or something of that nature in order to use .SaveAsTextFile(HDFSPath) to write to CDH HDFS. Is there some way to assign my Spark Streaming super user permissions? Thanks for any advice!

Online	Offline
Last Visited	‎12-12-2014 01:51 PM

Member Since	‎10-29-2014 09:20 PM
Last Visited	‎12-12-2014 01:51 PM
Posts	9

Cloudera Community

Re: Getting exception => java.lang.NoClassDefFound...

Getting exception => java.lang.NoClassDefFoundErro...

Re: Best way to dump all data from Spark Streaming...

Best way to dump all data from Spark Streaming job...

Attempting to write remote Spark Streaming job to ...