Member since
10-29-2014
9
Posts
0
Kudos Received
0
Solutions
11-06-2014
02:02 PM
Including the all the jars worked like a charm. Regarding the "64MB limit", I wasn't able to upload the uber-jar to HDFS via Hue (Error:Undefined) and from some searches I saw people claiming that they thought it was a size issue. Thanks!
... View more
11-05-2014
03:11 PM
I have checked out some message boards and found we should add jars to the class path if we get errors of this nature and I have attempted to do that but I'm not having any luck. below are my attempts to add the jar to the classpath which I think were succesfull [hdfs@cloudera01 root]$ ls zookeeper-3.4.5-cdh5.1.3.jar [hdfs@cloudera01 root]$ pwd /root [hdfs@cloudera01 root]$ export SPARK_SUBMIT_CLASSPATH=./commons-codec-1.4.jar:$SPARK_HOME/assembly/lib/*:./root/zookeeper-3.4.5-cdh5.1.3.jar [hdfs@cloudera01 root]$ echo $SPARK_SUBMIT_CLASSPATH ./commons-codec-1.4.jar:/assembly/lib/*:./root/zookeeper-3.4.5-cdh5.1.3.jar Does anyone have any ideas or notice if my commands are off? Links I used to try these operations are below, thanks for any thoughts/ideas http://apache-spark-user-list.1001560.n3.nabble.com/org-I0Itec-zkclient-serialize-ZkSerializer-ClassNotFound-td15919.html http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_running_crunch_with_spark.html I believe that in the Cloudera article it is saying to include the app you are running with the missing dependency included within that jar in the classpath rather than just the missing jar but I have been having a difficult time gettting Maven to doexactly what I would like it to do. I can get it to include the whole dependency list of my app but then Hue/something-with-my-setup won't let me upload the jar because I belive there is an issue with uploading files greater than 64MB just going off Google search results. I don't think I would need to include my app with the dependency for it to work though if that was the root of the problem.
... View more
Labels:
- Labels:
-
Apache Zookeeper
10-31-2014
11:29 AM
I have a job that calculates some statistics for a short rolling window time period and would like to be able to dump all the data into HDFS. I have come to learn HDFS does not support appends. Attempting to set my Spark app to make a new directory and and write to a new file for every RDD is not viable. After searching around I found an Avro object DataFileWriter which looks like it would work but according to the Spark user group message referenced below the object won't seriealize so it won't make it out to the worker nodes. I have read that SparkSQL can consume from Kafka and then write to a parquet file which seems like it would solve my problem but Cloudera does not include SparkSQL. Would it be out of the question to try to get SparkSQL and have it write to my CDH HDFS? I don't think I would be able to hook those two up. Does anyone know of possible solutions to the problem I have? http://apache-spark-user-list.1001560.n3.nabble.com/Persisting-Avro-files-from-Spark-streaming-td1094.html
... View more
Labels:
10-29-2014
09:34 PM
I have been trying to find some Spark Streaming class or config to enable me to present permission credentials or something of that nature in order to use .SaveAsTextFile(HDFSPath) to write to CDH HDFS. Is there some way to assign my Spark Streaming super user permissions? Thanks for any advice!
... View more
Labels: