Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar

This article "Configure SAP Vora HDP Ambari - Part 2" is continuation of "Getting started with SAP Hana and Vora with HDP using Apache Zeppelin for Data Analysis - Part 1 In...

Log back in to SAP Cloud Appliance Library - the free service to manage your SAP solutions in the public cloud. You should have HANA and Vora instances up and running:

8003-screen-shot-2016-09-26-at-104614-am.png

  1. Open Apache Ambari web UI click on Connect in your SAP HANA Vora instance in CAL, and then pick Open a link for Application: Ambari.

    The port of Ambari web UI has been preconfigured for you in the SAP HANA Vora, developer edition, in CAL. As well its port has been opened as one of the default Access Points. As you might remember it translates into the appropriate inbound rule in the corresponding AWS’s security group.8004-screen-shot-2016-09-26-at-120416-pm.png

  2. Log into Ambari web UI using the user admin and the master password you defined during process of the creation of the instance in CAL.
  3. You can see that (1) all services, including SAP HANA Vora components, are running, that (2) there are no issues with resources, and that (3) there are no alerts generated by the the system.

    You use this interface to start/stop cluster components if needed during operations or troubleshooting.

    Please refer to Apache Ambari official documentation if you need additional information and training how to use it.

    For detailed review of all SAP HANA Vora components and their purpose please review SAP HANA Vora help

8005-screen-shot-2016-09-26-at-121930-pm.png

We will need to make some configuration to get the HDFS View to work in Ambari and also modify Yarn scheduler.

Setup HDFS Ambari View:

Creating and Configuring a Files View Instance

  1. Browse to the Ambari Administration interface.
  2. Click Views, expand the Files View, and click Create Instance.
  3. Enter the following View instance Details:

    Property

    Description

    Value

    Instance NameThis is the Files view instance name. This value should be unique for all Files view instances you create. This value cannot contain spaces and is required.

    HDFS

    Display NameThis is the name of the view link displayed to the user in Ambari Web.MyFiles
    DescriptionThis is the description of the view displayed to the user in Ambari Web.Browse HDFS files and directories.
    VisibleThis checkbox determines whether the view is displayed to users in Ambari Web.Visible or Not Visible

You should see the an ambari HDFS view like this:

8006-screen-shot-2016-09-26-at-123356-pm.png

Next

  1. In Ambari Web, browse to Services > HDFS > Configs.
  2. Under the Advanced tab, navigate to the Custom core-site section.
  3. Click Add Property… to add the following custom properties:hadoop.proxyuser.root.groups=*

    hadoop.proxyuser.root.hosts=*

8007-screen-shot-2016-09-26-at-123813-pm.png

Now lets test that you can view the HDFS View:

8009-screen-shot-2016-09-26-at-124506-pm.png

Next we will reconfigure the Yarn to fix an issue when submitting yarn jobs. I got this when running a sqoop job to import data from SAP HANA to HDFS ( this will be a separate how-to article published soon)

YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.been stuck like that for a while

Lets set yarn.scheduler.capacity.maximum-am-resource-percent=0.6 . Go to YARN -> Configs and look for property yarn.scheduler.capacity.maximum-am-resource-percent

https://hadoop.apache.org/docs/r0.23.11/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html

yarn.scheduler.capacity.maximum-am-resource-percent /yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percentMaximum percent of resources in the cluster which can be used to run application masters - controls number of concurrent active applications. Limits on each queue are directly proportional to their queue capacities and user limits. Specified as a float - ie 0.5 = 50%. Default is 10%. This can be set for all queues with yarn.scheduler.capacity.maximum-am-resource-percent and can also be overridden on a per queue basis by settingyarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent

8008-screen-shot-2016-09-26-at-124208-pm.png

Now lets connect to Apache Zeppelin and load sample data from files already created in HDFS in SAP HANA Vora

  1. Apache Zeppelin is a web-based notebook that enables interactive data analytics. multi-purposed web-based notebook which brings data ingestion, data exploration, visualization, sharing and collaboration features to Hadoop and Spark.

    SAP HANA Vora provides its own %vora interpreter, which allows Spark/Vora features to be used from Zeppelin. Zeppelin allows queries to be written directly in Spark SQL

    https://hortonworks.com/apache/zeppelin/

  2. SAP HANA Vora, developer edition, on CAL comes with Apache Zeppelin pre-installed. Similar to opening Apache Ambari to open Zeppelin web UI click on Connect in your SAP HANA Vora instance in CAL, and then pick Open a link for Application: Zeppelin.
  3. Zeppelin opens up in a new browser window, check it is Connected and if yes, then click on 0_DemoData notebook.
  4. The 0_DemoData notebook will open up. Now you can click on Run all paragraphsbutton on top of the page to create tables in SAP HANA Vora using data from the existing HDFS files preloaded on the instance in CAL. These are the tables you will need as well later in exercises.

    A dialog window will pop up asking you to confirm to Run all paragraphs? Click OK

8010-screen-shot-2016-09-26-at-125212-pm.png

The Vora code will load .csv files and create tables in Vora Spark. You can navigate to the hdfs files using the created view earlier to preview the data right on HDFS:

8011-screen-shot-2016-09-26-at-125726-pm.png

At this point we setup an Ambari HDFS view to browse our distributed files system on HDP and tested the Vora connectivity to HDFS that everything is working.

Stay tuned for the next article "How to connect SAP Vora to SAP HANA using Apache Zeppelin", where we will now use the Apache Zeppelin to connect to the SAP HANA system in part 1

References:

https://community.hortonworks.com/articles/27387/virtual-integration-of-hadoop-with-external-system....

https://community.hortonworks.com/content/kbentry/29928/using-spark-to-virtually-integrate-hadoop-wi...

http://help.sap.com/Download/Multimedia/hana_vora/SAP_HANA_Vora_Installation_Admin_Guide_en.pdf

http://go.sap.com/developer/tutorials/hana-setup-cloud.html

http://help.sap.com/hana_vora_re

http://go.sap.com/developer/tutorials/vora-setup-cloud.html

http://go.sap.com/developer/tutorials/vora-connect.html

http://help.sap.com/Download/Multimedia/hana_vora/SAP_HANA_Vora_Installation_Admin_Guide_en.pdf

2,780 Views