Community Articles

lwang · ‎05-23-2017

This tutorial is part two of a two-part series. In this tutorial, we'll verify Spark 2.1 functionality using Zeppelin on an HDP 2.6 cluster deployed using Cloudbreak. The first tutorial covers using Cloudbreak to deploy the cluster. You can find the first tutorial here: HCC Article

Prerequisites

You should already have completed part one of this tutorial series and already have an Cloudbreak HDP 2.6 with Spark 2.1 cluster running.

Scope

This tutorial was tested in the following environment:

Cloudbreak 1.14.4
AWS EC2
HDP 2.6
Spark 2.1
Zeppelin 0.7

Steps

Login into Ambari

As mentioned in the prerequisites, you should already have a cluster built using Cloudbreak. Click on the cluster summary box in the Cloudbreak UI to display the cluster details. Now click on the link to your Ambari cluster. You may see something similar to this:

Your screen may vary depending on your browser of choice. I'm using Chrome. This warning is because we are using self-signed certificates which are not trusted. Click on the ADVANCED link. You should see something similar to this:

Click on the Proceed link to open the Ambari login screen. You should be able to login to Ambari using the username and password admin.

Login to Zeppelin

Now click on the Zeppelin component in the component status summary. You should see something similar to this:

Click on the Quicklinks link. You should see something similar to this:

Click on the Zeppelin UI link. This will load Zeppelin in a new browser tab. You should see something similar to this:

You should notice the blue Login button in the upper right corner of the Zeppelin UI. Click on this button. You should see something similar to this:

You should be able to login to Zeppelin using the username and password admin. Once you login, you should see something similar to this:

Load Getting Started Notebook

Now let's load the Apache Spark in 5 Minutes notebook by clicking on the Getting Started link. You should see something similar to this:

Click on the Apache Spark in 5 Minutes notebook. You should see something similar to this:

This is showing you the Zeppelin interpreters associated with this notebook. As you can see, the spark2 and livy2 interpreters are enabled. Click the blue Save button. You should see something similar to this:

This notebook defaults to using the Spark 2.x interpreter. You should be able to run the paragraphs without any changes. Scroll down the the notebook paragraph called Verify Spark Version. Click the play button on this paragraph. You should see something similar to this:

You should notice the Spark version is 2.1.0.2.6.0.3-8. This confirms we are using Spark 2.1. It also confirms that Zeppelin is able to properly interact with Spark 2 on our HDP 2.6 cluster built with Cloudbreak. Try running the next two paragraphs. These paragraphs download a json file form github and then moves it to HDFS on our cluster. Now run the Load data into a Spark DataFrame paragraph. You should see something similar to this:

As you can see, the DataFrame should be properly loaded from the json file.

Next Steps

Try running the remaining paragraphs to ensure everything is working ok. For an extra challenge, try running some of the other Spark 2 notebooks that are included. You can also attempt to modify the Spark 1.6 notebooks to work with Spark 2.1.

Review

If you have successfully followed along with this tutorial, you should have been able to confirm Spark 2.1 works on our HDP 2.6 cluster deployed with Cloudbreak.

agadrias · ‎08-28-2018

When I try to launch Zeppellin UI, i get this error:

error.jpg

Cloudera Community

Community Articles

Using Zeppelin with Spark 2.1 on HDP 2.6 cluster built with Cloudbreak

Apache Spark

Apache Zeppelin

Hortonworks Cloudbreak

Hortonworks Data Platform (HDP)