About azeltov

azeltov · ‎09-26-2016

SAP HANA Vora is an in-memory processing engine that runs on a Hadoop cluster and is tightly integrated with Spark. It is designed for handling big data. SAP HANA Vora makes available OLAP-style capabilities on Hadoop, provides deeper integration with SAP HANA, enabling high-performance enterprise analytics, and delivers contextual insights by combining corporate data in SAP HANA with big data stored in Hadoop systems. In this multi-part guide , I will show you how to spin up an SAP HANA instance in AWS and a Vora + HDP installation on 2nd node. We will utilize Apache Zeppelin to interact with SAP HANA using a Vora interpreter . In this scenario you will be able to join data from other various data sources like HDFS and RDBMs to join to Hana data. This is a Federated Query approach to multiple data sources. "Federation" tier to act as a single point of access to data from multiple sources. For details on the concepts of Data Federation see "Virtual Integration of Hadoop with External Systems" . SAP HANA Vora enables OLAP analysis of Hadoop data through data hierarchy enhancements in SparkSQL and compiled queries for accelerated processing across nodes. It democratizes data access for data scientists and developers to easily enrich their datasets in Hadoop and other data sources like RDBMs, json, txt, etc. HDP stack allows you natively to do "federated querying" using the Spark engine, see Using Spark to Virtually Integrate Hadoop with External Systems , using VORA you get native connectivity to HANA and the additional UDF functions like hierarchies. To easily spin up the HANA and Vora with HDP we will utilize Amazon Cloud AWS services. You have an option to spin up in Amazon or Microsoft the HANA , however the Vora + HDP instance is only available using Amazon. For the simplicity we will use Amazon for now. In the future article I will create a how to install SAP Vora with HDP walk thru. This is the official install doc SAP_HANA_Vora_Installation_Admin_Guide First we will need to spin up a HANA instance, you will need to register for the SAP Cloud Appliance Library - the free service to manage your SAP solutions in the public cloud. Make sure you have an account there before proceeding with this tutorial.Once you register and sign in: On the left, click on SOLUTIONS to see the systems available for use. Search for "developer" in the search box to find the HANA developer edition. Choose "SAP HANA Vora, 1.2, developer edition" Once you’ve found the instance through the search, you need to “activate” it. Activating an instance connects it to your account on Amazon AWS. After the solution is activated, the link next to it should change to Create Instance. Finally, click the “Create Instance” link on this solution to start the setup wizard. The wizard will take you through a few simple steps and then you will have your instance up and running. These steps are outlined below. Choose your account, select your region, enter a name for your instance and password for your instance. This is the “simple” setup and only requires those couple of items to generate your instance. Enter a password for your system. Configure the schedule for the virtual machine. This option allows you to define a specific date when the machine will shut down, or a schedule when it should be running. The virtual machine will suspend on the date you set. Click Next when you have set a run schedule, or a suspend date. After the process of creating the VM starts, you will be prompted to download your “Key Pair”. Make sure to download the "pem" file you will need this to ssh back to the created instance. It will take about 10-25 minutes for your VM to start. You can see your instance status by clicking on the INSTANCE tab of the Cloud Appliance Library main screen. Next lets spin up the VORA instance from SAP Cloud Appliance Library: On the left, click on SOLUTIONS to see the systems available for use. Search for "developer" in the search box to find the HANA Vora 1.2, developer edition. Walk through the wizard to spin up the Vora instance. Make sure to select the same AWS region as the SAP HANA instance as the two systems will need to communicate and you dont want to cross geo-boundaries. Remember the master password, i created same as the HANA installation. It is important that you click Download and store a file with a private key. You will use it to connect to the instance’s host using ssh client Once your instance of SAP HANA Vora is fully activated you can see it among your CAL’s Instances with Active status. You can see the 2 instances as well in your AWS account In the next article Part 2 we will explore how to Configure SAP HANA Vora HDP Ambari References: https://community.hortonworks.com/articles/27387/virtual-integration-of-hadoop-with-external-system.html https://community.hortonworks.com/content/kbentry/29928/using-spark-to-virtually-integrate-hadoop-with-ext.html http://help.sap.com/Download/Multimedia/hana_vora/SAP_HANA_Vora_Installation_Admin_Guide_en.pdf http://go.sap.com/developer/tutorials/hana-setup-cloud.html http://help.sap.com/hana_vora_re http://go.sap.com/developer/tutorials/vora-setup-cloud.html http://help.sap.com/Download/Multimedia/hana_vora/SAP_HANA_Vora_Installation_Admin_Guide_en.pdf

mrizvi · ‎10-31-2016

HI @azeltov, I am trying to install R-studio on Hortonworks sandbox 2.5, running through the exception in verify installation step: initctl: Unable to connect to Upstart: Failed to connect to socket /com/ubuntu/upstart: Connection refused I have tried starting, stopping rstudio server, it shows the same message. PS: Since it is a docker container, 8787 port is not opened so I have configured /etc/rstudio/rserver.conf to use port 9000.

wojtekk · ‎09-06-2016

Hi, looks like simple error: I see s3a in your exception, but I think s3 or s3n should be there.

azeltov · ‎04-01-2016

@eorgadn You should wrap the geoDistance functions as hive UDF’s it will be a lot friendlier for most people that will want to use it in hive.

richard_xu · ‎04-25-2016

Ancil, I have question regarding: hive.tez.container.size is multiple of yarn.scheduler.minimum-allocation-mb, why so? if yarn.scheduler.maximum-allocation-mb = 24GB, yarn.scheduler.minimum-allocation-mb = 4GB, hive.tez.container.size=5B, would not Yarn smart enough to assign 5GB to a container to satisfy tez needs? Thanks, Richard

azeltov · ‎08-24-2016

@Alexander is there a full list of these hdi scripts available? If not how did you discover the ones above?

christian_proko · ‎05-20-2016

Hi @Neeraj Sabharwal, When will it become GA? Best, Christian

azeltov · ‎12-23-2015

Very nice! A good way to do ETL and create SOLR indexes.

rmolina · ‎12-18-2015

Just to add to this article, sandbox.hortonworks.com needs to be mapped to the ip address of the sandbox virtual machine. Typically out of the box, the VirtualBox version uses the loop back ip 127.0.0.1 vs. the Vmware image provides an IP generated dependent on the network vm settings configured. Thus, if you don't have sandbox.hortonworks.com in your hosts file on your machine, use the ip address instead such as http://127.0.0.1:4200

azeltov · ‎11-12-2015

Got it syncing to the hub! So if i understand this correct, now if I want to sync these notebooks to another zeppelin, i just put in the same "hub_api_token" in that zeppelin and will it sync to that zeppelin instance? Or is that a feature that's not developed yet?

Online	Offline
Last Visited	‎08-14-2019 06:45 PM

Member Since	‎09-29-2015 01:18 AM
Last Visited	‎08-14-2019 06:45 PM
Posts	155
Kudos received	171

Cloudera Community

Getting started with SAP Hana and Vora with HDP us...

Re: Running SparkR in RStudio using HDP 2.4

Re: HDP 2.4.0 and Spark 1.6.0 connecting to AWS S3...

Re: Geo Distance calculations in Hive and Java

Re: Demystify Apache Tez Memory Tuning - Step by S...

Re: How to install Apache Zeppelin, R, Solr, and G...

Re: Apache Zeppelin and SparkR

Re: Spark DataFrame to Solr Cloud - runs on Sandbo...

Re: Hidden Gem in HDP sandbox. SSH Web Server on p...

Re: Apache Zeppelin Walk Through