This tutorial is part two of a two-part series. In this tutorial, we'll verify Spark 2.1 functionality using Zeppelin on an HDP 2.6 cluster deployed using Cloudbreak. The first tutorial covers using Cloudbreak to deploy the cluster. You can find the first tutorial here: HCC Article
This tutorial was tested in the following environment:
As mentioned in the prerequisites, you should already have a cluster built using Cloudbreak. Click on the cluster summary box in the Cloudbreak UI to display the cluster details. Now click on the link to your Ambari cluster. You may see something similar to this:
Your screen may vary depending on your browser of choice. I'm using Chrome. This warning is because we are using self-signed certificates which are not trusted. Click on the ADVANCED
link. You should see something similar to this:
Click on the Proceed
link to open the Ambari login screen. You should be able to login to Ambari using the username and password admin
.
Now click on the Zeppelin component in the component status summary. You should see something similar to this:
Click on the Quicklinks
link. You should see something similar to this:
Click on the Zeppelin UI
link. This will load Zeppelin in a new browser tab. You should see something similar to this:
You should notice the blue Login
button in the upper right corner of the Zeppelin UI. Click on this button. You should see something similar to this:
You should be able to login to Zeppelin using the username and password admin
. Once you login, you should see something similar to this:
Now let's load the Apache Spark in 5 Minutes
notebook by clicking on the Getting Started
link. You should see something similar to this:
Click on the Apache Spark in 5 Minutes
notebook. You should see something similar to this:
This is showing you the Zeppelin interpreters associated with this notebook. As you can see, the spark2
and livy2
interpreters are enabled. Click the blue Save
button. You should see something similar to this:
This notebook defaults to using the Spark 2.x interpreter. You should be able to run the paragraphs without any changes. Scroll down the the notebook paragraph called Verify Spark Version
. Click the play button on this paragraph. You should see something similar to this:
You should notice the Spark version is 2.1.0.2.6.0.3-8
. This confirms we are using Spark 2.1. It also confirms that Zeppelin is able to properly interact with Spark 2 on our HDP 2.6 cluster built with Cloudbreak. Try running the next two paragraphs. These paragraphs download a json file form github and then moves it to HDFS on our cluster. Now run the Load data into a Spark DataFrame
paragraph. You should see something similar to this:
As you can see, the DataFrame should be properly loaded from the json file.
Try running the remaining paragraphs to ensure everything is working ok. For an extra challenge, try running some of the other Spark 2 notebooks that are included. You can also attempt to modify the Spark 1.6 notebooks to work with Spark 2.1.
If you have successfully followed along with this tutorial, you should have been able to confirm Spark 2.1 works on our HDP 2.6 cluster deployed with Cloudbreak.
Created on 08-28-2018 10:45 PM
When I try to launch Zeppellin UI, i get this error: