Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

In Qi Wang article on Spark Machine Learning Pipeline by example, what is the tweak to make code run in HDP 2.5 Spark 2.0?

New Contributor
 
3 REPLIES 3

@Mark Ott

The version of Zeppelin included in HDP 2.5 does not support spark 2.0 but you can run the example spark jobs using spark submit. From the command line run the following commands followed by ./bin/spark-submit. For more info on spark submit see https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/run-sample...:

export SPARK_HOME=/usr/hdp/current/spark2-client

export SPARK_MAJOR_VERSION=2

cd /usr/hdp/current/spark2-client

Expert Contributor

@Mark Ott

Mark, I just updated the article to include information on how to run the tutorial with Spark 2.0. With the limitation of Zeppelin in HDP 2.5, you could only run it inside spark-shell, but that should be sufficient for understanding how the pipeline works.

https://community.hortonworks.com/content/kbentry/53903/spark-machine-learning-pipeline-by-example.h...

Contributor
@Qi Wang

I am running this flight delays use case in Spark 1.6.0 and I am getting the below issue. Can you please let me know what I am missing?

scala> val lrModel = lrPipeline.fit(trainingData)

<console>:64: error: type mismatch; found : org.apache.spark.rdd.RDD[String] required: org.apache.spark.sql.DataFrame val lrModel = lrPipeline.fit(trainingData)