Created on 05-23-201711:16 PM - edited 08-17-201912:51 PM
This tutorial is part two of a two-part series. In this tutorial, we'll verify Spark 2.1 functionality using Zeppelin on an HDP 2.6 cluster deployed using Cloudbreak. The first tutorial covers using Cloudbreak to deploy the cluster. You can find the first tutorial here: HCC Article
You should already have completed part one of this tutorial series and already have an Cloudbreak HDP 2.6 with Spark 2.1 cluster running.
This tutorial was tested in the following environment:
Login into Ambari
As mentioned in the prerequisites, you should already have a cluster built using Cloudbreak. Click on the cluster summary box in the Cloudbreak UI to display the cluster details. Now click on the link to your Ambari cluster. You may see something similar to this:
Your screen may vary depending on your browser of choice. I'm using Chrome. This warning is because we are using self-signed certificates which are not trusted. Click on the ADVANCED link. You should see something similar to this:
Click on the Proceed link to open the Ambari login screen. You should be able to login to Ambari using the username and password admin.
Login to Zeppelin
Now click on the Zeppelin component in the component status summary. You should see something similar to this:
Click on the Quicklinks link. You should see something similar to this:
Click on the Zeppelin UI link. This will load Zeppelin in a new browser tab. You should see something similar to this:
You should notice the blue Login button in the upper right corner of the Zeppelin UI. Click on this button. You should see something similar to this:
You should be able to login to Zeppelin using the username and password admin. Once you login, you should see something similar to this:
Load Getting Started Notebook
Now let's load the Apache Spark in 5 Minutes notebook by clicking on the Getting Started link. You should see something similar to this:
Click on the Apache Spark in 5 Minutes notebook. You should see something similar to this:
This is showing you the Zeppelin interpreters associated with this notebook. As you can see, the spark2 and livy2 interpreters are enabled. Click the blue Save button. You should see something similar to this:
This notebook defaults to using the Spark 2.x interpreter. You should be able to run the paragraphs without any changes. Scroll down the the notebook paragraph called Verify Spark Version. Click the play button on this paragraph. You should see something similar to this:
You should notice the Spark version is 18.104.22.168.6.0.3-8. This confirms we are using Spark 2.1. It also confirms that Zeppelin is able to properly interact with Spark 2 on our HDP 2.6 cluster built with Cloudbreak. Try running the next two paragraphs. These paragraphs download a json file form github and then moves it to HDFS on our cluster. Now run the Load data into a Spark DataFrame paragraph. You should see something similar to this:
As you can see, the DataFrame should be properly loaded from the json file.
Try running the remaining paragraphs to ensure everything is working ok. For an extra challenge, try running some of the other Spark 2 notebooks that are included. You can also attempt to modify the Spark 1.6 notebooks to work with Spark 2.1.
If you have successfully followed along with this tutorial, you should have been able to confirm Spark 2.1 works on our HDP 2.6 cluster deployed with Cloudbreak.