About _Umesh

_Umesh · ‎06-21-2016

This looks strange. Your console output listed the below lines com.databricks#spark-avro_2.10 added as a dependency org.apache.avro#avro-mapred added as a dependency Can you try once with : --packages com.databricks:spark-avro_2.10:1.0.0,org.apache.avro:avro-mapred:1.6.3 I can sense some version compatibility issues of avro-mapred with spark-avro.

_Umesh · ‎06-15-2016

Try starting spark-shell with following packages: --packages com.databricks:spark-avro_2.10:2.0.1,org.apache.avro:avro-mapred:1.7.7

_Umesh · ‎06-10-2016

What Python version you are using. You may want to refer : http://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_ipython.html

_Umesh · ‎05-30-2016

See the Environment tab of Job History UI and locate "spark.local.dir". Yes that is the expected behaviour as JAR is required to the executors.

_Umesh · ‎05-30-2016

This looks weird. And can you confirm that http://192.168.88.28:55310/jars/phoenix-1.2.0-client.jar is still not present? Spark keeps all JARs specified by --jars option in job's temp directory on each executor nodes [1]. There must be some sort of OS settings which lead the deletion of existing phoenix jar from temp and when Spark Context is unable to find it at its usual location it tries to download it from the given location. However this should not happen until the temp directory is actively accessed by the job or process. You can try bundling that JAR with your Spark JAR and then refer it in spark-submit. I suspect, you will need again 20 odd days to test this workaround 🙂

_Umesh · ‎05-11-2016

You are messing with createPollingStream method. Give 198.168.1.31 as sink address as below and it should work. FlumeUtils.createPollingStream(ssc,"198.168.1.31",8020)

_Umesh · ‎05-11-2016

Add below dependency as well: groupId = org.apache.spark artifactId = spark-streaming-flume_2.10 version = 1.6.1 See here for pull based configuration.

_Umesh · ‎04-20-2016

CM is supporting single version for Spark on YARN and single version for Standalone installation (Single version is common requirement). For supporting multiple versions of Spark you need to install it manually on a single node and copy the config files for YARN and Hive inside its conf directory. And when you refer the spark-submit of that version, it will distribute the Spark-core binary on each YARN nodes to execute your code. You don't need to install Spark on each YARN nodes.

_Umesh · ‎04-19-2016

Yes, YARN provides this flexibility. Here you can find the detailed answer. For CDH there is a "Spark" service, which meant for YARN and another is "Spark Standalone" service which runs it's daemons standalone on the specified nodes. YARN will do the work for you if you want to test the multiple versions simultaneously. You should have your multiple versions on Gateway Host and then you can launch Spark applications from there.

_Umesh · ‎04-07-2016

Thats because you have no new files arriving in the directory after streaming application starts. You can try "cp" to drop files in the directory after starting the streaming application.

Online	Offline
Last Visited	‎12-11-2024 06:25 AM

Member Since	‎04-05-2016 11:32 PM
Last Visited	‎12-11-2024 06:25 AM
Posts	37
Kudos received	8

Cloudera Community

Re: How deep does command "yarn application -list ...

Re: Config log4j in Spark - Driver Logs

Re: How to define datatype when creating dataframe...

Re: create Analytics from http usng spark streamin...

Re: not able to launch PYSPARK CDH 5.5.0 and spark...

Re: NoClassDefFoundError when using avro in spark-...

Re: NoClassDefFoundError when using avro in spark-...

Re: Unrecognized alias: '--profile=pyspark'

Re: Spark Streaming: FileNotFoundException on file...

Re: Spark Streaming: FileNotFoundException on file...

Re: Integration of Spark Streaming with Flume

Re: Integration of Spark Streaming with Flume

Re: Multiple Spark version on the same cluster

Re: Multiple Spark version on the same cluster

Re: How to use saveAsTextFiles in spark streaming