Member since
09-29-2015
155
Posts
205
Kudos Received
18
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
8362 | 02-17-2017 12:38 PM | |
1332 | 11-15-2016 03:56 PM | |
1872 | 11-11-2016 05:27 PM | |
15356 | 11-11-2016 12:16 AM | |
3073 | 11-10-2016 06:15 PM |
10-18-2016
04:09 PM
2 Kudos
@Amal Babu See this Stackoverflow question, I would follow that approach create a case class like they show: case class Person(inputPath: String, name: String, age: Int)
val inputPath = "hdfs://localhost:9000/tmp/demo-input-data/persons.txt"
val rdd = sc.textFile(inputPath).map {
l =>
val tokens = l.split(",")
Person(inputPath, tokens(0), tokens(1).trim().toInt)
}
rdd.collect().foreach(println) //and than convert RDD to DF import sqlContext.implicits._
val df = rdd.toDF()
df.registerTempTable("x") http://stackoverflow.com/questions/33293362/how-to-add-source-file-name-to-each-row-in-spark
... View more
09-26-2016
09:31 PM
4 Kudos
This article "Perform Data Analysis using SAP Vora on SAP Hana data - Part 4" is continuation of "Load Demo data in SAP Vora Using Eclipse HANA Modelling tools - Part 3" Log back in to SAP Cloud Appliance Library - the free service to manage your SAP solutions in the public cloud. You should have HANA and Vora instances up and running:
Open Apache Zeppelin web UI click on Connect in your SAP HANA Vora instance in CAL, and then pick Open a link for Application: Zeppelin
In Zeppelin click Create new note, name the notebook as you like ex "interact with hana data" In the first cell lets define our imports and HANA connection information import sqlContext.implicits._
import scala.collection.mutable.Map
val HANA_HOSTNAME = "xxx.xxx.xx.xxx"
val HANA_INSTANCE = "00"
val HANA_SCHEMA = "CODEJAMMER"
val HANA_USERNAME = "CODEJAMMER"
val HANA_PASSWORD = "CodeJam2016"
4. In the next cell we will use sqlContext that was automatically created by Zeppelin to Connect to HANA using "com.sap.spark.hana" sqlContext.sql( s"""
CREATE TABLE EMPLOYEE_ADDRESS
USING
com.sap.spark.hana
OPTIONS (
path "EMPLOYEE_ADDRESS",
host "${HANA_HOSTNAME}",
dbschema "${HANA_SCHEMA}",
user "${HANA_USERNAME}",
passwd "${HANA_PASSWORD}",
instance "${HANA_INSTANCE}"
)
""".stripMargin )
5. Now lets query the HANA database from Zeppelin sqlContext.sql("select * from EMPLOYEE_ADDRESS").show You should get the results from the EMPLOYEE_ADDRESS table we created earlier. +------------+-------------+--------+-----+-------+
|STREETNUMBER| STREET|LOCALITY|STATE|COUNTRY|
+------------+-------------+--------+-----+-------+
| 555| Madison Ave|New York| NY|America|
| 95|Morten Street|New York| NY| USA|
+------------+-------------+--------+-----+-------+
... View more
Labels:
09-26-2016
06:30 PM
4 Kudos
This article "Load Demo data in SAP Vora Using Eclipse HANA Modelling tools - Part 3" is continuation of "Configure SAP Vora HDP Ambari - Part 2" You will need to download Eclipse Neon - Eclipse IDE for Java Developers to connect to the SAP HANA that we setup in Part 1 . After you setup eclipse we will need to configure Eclipse to install HANA Modelling tools that will allow us to connect to SAP HANA and execute sql scripts to setup demo data that we will use from SAP Vora. Eclipse Setup Procedure
Open the Eclipse IDE. In the main menu, choose HelpInstall New Software . Depending on the Eclipse version you have installed, enter one of the following URLs in the Work with field:
For Eclipse Neon (4.6), add URL: https://tools.hana.ondemand.com/neon Select SAP HANA Tools (the whole feature group).noteIn case you need to develop with SAPUI5, install also SAP HANA Cloud Platform ToolsUI development toolkit for HTML5 (Developer Edition) . Choose Next. On the next wizard page, you get an overview of the features to be installed. Choose Next. Confirm the license agreements. Choose Finish to start the installation. After the successful installation, you will be prompted to restart your Eclipse IDE. Log back in to SAP Cloud Appliance Library - the free service to manage your SAP solutions in the public cloud. You should have HANA and Vora instances up and running. Click on SAP HANA Connect link and click open: You should see the HANA home page for your instance: You will see a warning for 'hosts' file please follow the steps to setup your hosts entry in your local sytem, this will simplify your life.
Open your Terminal or Shell application and type the commandsudo nano /etc/hosts Add the following line or modify it if you already have it.
xxx.xxx.xxx.xxx vhcalhdbdb You will need the server connectivity and CODEJAMMER information for steps in setting up eclipse. Next lets create a new HOST in eclipse HANA tools. CLICK ADD SYSTEM... Now you will need to add the hostname that you got from HANA home page above, instance is 00, connect using Role: Developer User: CODEJAMMER
Password: CodeJam2016 You should have a successful connection like this. Right click on the new system you setup and opensql console and run the following code: DROP TABLE "CODEJAMMER"."EMPLOYEE_ADDRESS";
CREATE COLUMN TABLE "CODEJAMMER"."EMPLOYEE_ADDRESS" ("STREETNUMBER" INTEGER CS_INT,
"STREET" NVARCHAR(200),
"LOCALITY" NVARCHAR(200),
"STATE" NVARCHAR(200),
"COUNTRY" NVARCHAR(200)) UNLOAD PRIORITY 5 AUTO MERGE ;
insert into "CODEJAMMER"."EMPLOYEE_ADDRESS" values(555,'Madison Ave','New York','NY','America');
insert into "CODEJAMMER"."EMPLOYEE_ADDRESS" values(95,'Morten Street','New York','NY','USA');
Next lets test to make sure you got data , run the following sql command: SELECT * FROM "CODEJAMMER"."EMPLOYEE_ADDRESS";
You should get a result like this: Congrats you create an "EMPLOYEE_ADDRESS" table and populated with sample data. Next article will load this sample data from the Apache Zeppelin. Perform Data Analysis using SAP Vora on SAP Hana data - Part 4 References: http://go.sap.com/developer/tutorials/hana-web-development-workbench.html https://help.hana.ondemand.com/help/frameset.htm?b0e351ada628458cb8906f55bcac4755.html https://community.hortonworks.com/articles/27387/virtual-integration-of-hadoop-with-external-system.html https://community.hortonworks.com/content/kbentry/29928/using-spark-to-virtually-integrate-hadoop-with-ext.html http://help.sap.com/Download/Multimedia/hana_vora/SAP_HANA_Vora_Installation_Admin_Guide_en.pdf http://go.sap.com/developer/tutorials/hana-setup-cloud.html http://help.sap.com/hana_vora_re http://go.sap.com/developer/tutorials/vora-setup-cloud.html http://go.sap.com/developer/tutorials/vora-connect.html http://help.sap.com/Download/Multimedia/hana_vora/SAP_HANA_Vora_Installation_Admin_Guide_en.pdf
... View more
Labels:
09-26-2016
05:03 PM
6 Kudos
This article "Configure SAP Vora HDP Ambari - Part 2" is continuation of "Getting started with SAP
Hana and Vora with HDP using Apache Zeppelin for Data Analysis - Part 1
Intro" Log back in to SAP Cloud Appliance Library - the free service to manage your SAP solutions in the public cloud. You should have HANA and Vora instances up and running: Open Apache Ambari web UI click on Connect in your SAP HANA Vora instance in CAL, and then pick Open a link for Application: Ambari .
The port of Ambari web UI has been preconfigured for you in the SAP HANA Vora, developer edition, in CAL. As well its port has been opened as one of the default Access Points. As you might remember it translates into the appropriate inbound rule in the corresponding AWS’s security group. Log into Ambari web UI using the user admin and the master password you defined during process of the creation of the instance in CAL. You can see that (1) all services, including SAP HANA Vora components, are running, that (2) there are no issues with resources, and that (3) there are no alerts generated by the the system.
You use this interface to start/stop cluster components if needed during operations or troubleshooting. Please refer to Apache Ambari official documentation if you need additional information and training how to use it. For detailed review of all SAP HANA Vora components and their purpose please review SAP HANA Vora help We will need to make some configuration to get the HDFS View to work in Ambari and also modify Yarn scheduler. Setup HDFS
Ambari View: Creating and Configuring a Files View Instance Browse to the Ambari Administration interface. Click Views, expand the Files View, and click Create Instance.
Enter the following View instance Details: Property Description Value Instance Name This is the Files view instance name. This value should be unique for all Files view instances you create. This value cannot contain spaces and is required. HDFS Display Name This is the name of the view link displayed to the user in Ambari Web. MyFiles Description This is the description of the view displayed to the user in Ambari Web. Browse HDFS files and directories. Visible This checkbox determines whether the view is displayed to users in Ambari Web. Visible or Not Visible
You should see the an ambari HDFS view like this: Next
In Ambari Web, browse to Services > HDFS > Configs. Under the Advanced tab, navigate
to the Custom core-site section. Click Add
Property…
to add the following custom properties:hadoop.proxyuser.root.groups=* hadoop.proxyuser.root.hosts=* Now lets test that you can view the HDFS View: Next we will reconfigure the Yarn to fix an issue when submitting yarn jobs. I got this when running a sqoop job to import data from SAP HANA to HDFS ( this will be a separate how-to article published soon) YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.been stuck like that for a while Lets set yarn.scheduler.capacity.maximum-am-resource-percent=0.6 . Go to YARN -> Configs and look for property yarn.scheduler.capacity.maximum-am-resource-percent https://hadoop.apache.org/docs/r0.23.11/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html yarn.scheduler.capacity.maximum-am-resource-percent /yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent Maximum percent of resources in the cluster which can be used to run application masters - controls number of concurrent active applications. Limits on each queue are directly proportional to their queue capacities and user limits. Specified as a float - ie 0.5 = 50%. Default is 10%. This can be set for all queues with yarn.scheduler.capacity.maximum-am-resource-percent and can also be overridden on a per queue basis by settingyarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent Now lets connect to Apache Zeppelin and load sample data from files already created in HDFS in SAP HANA Vora Apache Zeppelin is a web-based notebook that enables interactive data analytics. multi-purposed web-based notebook which brings data ingestion, data exploration, visualization, sharing and collaboration features to Hadoop and Spark. SAP HANA Vora provides its own %vora interpreter, which allows Spark/Vora features to be used from Zeppelin. Zeppelin allows queries to be written directly in Spark SQL https://hortonworks.com/apache/zeppelin/ SAP HANA Vora, developer edition, on CAL comes with Apache Zeppelin pre-installed. Similar to opening Apache Ambari to open Zeppelin web UI click on Connect in your SAP HANA Vora instance in CAL, and then pick Open a link for Application: Zeppelin .
Zeppelin opens up in a new browser window, check it is Connected and if yes, then click on 0_DemoData notebook.
The 0_DemoData notebook will open up. Now you can click on Run all paragraphsbutton on top of the page to create tables in SAP HANA Vora using data from the existing HDFS files preloaded on the instance in CAL. These are the tables you will need as well later in exercises.
A dialog window will pop up asking you to confirm to Run all paragraphs? Click OK The Vora code will load .csv files and create tables in Vora Spark. You can navigate to the hdfs files using the created view earlier to preview the data right on HDFS: At this point we setup an Ambari HDFS view to browse our distributed files system on HDP and tested the Vora connectivity to HDFS that everything is working. Stay tuned for the next article "How to connect SAP Vora to SAP HANA using Apache Zeppelin", where we will now use the Apache Zeppelin to connect to the SAP HANA system in part 1 References: https://community.hortonworks.com/articles/27387/virtual-integration-of-hadoop-with-external-system.html https://community.hortonworks.com/content/kbentry/29928/using-spark-to-virtually-integrate-hadoop-with-ext.html http://help.sap.com/Download/Multimedia/hana_vora/SAP_HANA_Vora_Installation_Admin_Guide_en.pdf http://go.sap.com/developer/tutorials/hana-setup-cloud.html http://help.sap.com/hana_vora_re http://go.sap.com/developer/tutorials/vora-setup-cloud.html http://go.sap.com/developer/tutorials/vora-connect.html http://help.sap.com/Download/Multimedia/hana_vora/SAP_HANA_Vora_Installation_Admin_Guide_en.pdf
... View more
Labels:
09-26-2016
03:49 PM
Correct Livy server is only supported as zeppelin integration, not direct REST api call to Livy.
... View more
09-26-2016
03:04 PM
8 Kudos
SAP HANA Vora is an in-memory processing engine that runs on a Hadoop cluster and is tightly integrated with Spark. It is designed for handling big data. SAP HANA Vora makes available OLAP-style capabilities on Hadoop, provides deeper integration with SAP HANA, enabling high-performance enterprise analytics, and delivers contextual insights by combining corporate data in SAP HANA with big data stored in Hadoop systems.
In this multi-part guide , I will show you how to spin up an SAP HANA instance in AWS and a Vora + HDP installation on 2nd node. We will utilize Apache Zeppelin to interact with SAP HANA using a Vora interpreter .
In this scenario you will be able to join data from other various data sources like HDFS and RDBMs to join to Hana data. This is a Federated Query approach to multiple data sources. "Federation" tier to act as a single point of access to data from multiple sources. For details on the concepts of Data Federation see "Virtual Integration of Hadoop with External Systems" .
SAP HANA Vora enables OLAP analysis of Hadoop data through data hierarchy enhancements in SparkSQL and compiled queries for accelerated processing across nodes. It democratizes data access for data scientists and developers to easily enrich their datasets in Hadoop and other data sources like RDBMs, json, txt, etc.
HDP stack allows you natively to do "federated querying" using the Spark engine, see Using Spark to Virtually Integrate Hadoop with External Systems , using VORA you get native connectivity to HANA and the additional UDF functions like hierarchies. To easily spin up the HANA and Vora with HDP we will utilize Amazon Cloud AWS services. You have an option to spin up in Amazon or Microsoft the HANA , however the Vora + HDP instance is only available using Amazon. For the simplicity we will use Amazon for now. In the future article I will create a how to install SAP Vora with HDP walk thru. This is the official install doc SAP_HANA_Vora_Installation_Admin_Guide First we will need to spin up a HANA instance, you will need to register for the SAP Cloud Appliance Library - the free service to manage your SAP solutions in the public cloud. Make sure you have an account there before proceeding with this tutorial.Once you register and sign in: On the left, click on SOLUTIONS to see the systems available for use. Search for "developer" in the search box to find the HANA developer edition. Choose "SAP HANA Vora, 1.2, developer edition" Once you’ve found the instance through the search, you need to “activate” it. Activating an instance connects it to your account on Amazon AWS. After the solution is activated, the link next to it should change to Create Instance. Finally, click the “Create Instance” link on this solution to start the setup wizard. The wizard will take you through a few simple steps and then you will have your instance up and running. These steps are outlined below. Choose your account, select your region, enter a name for your instance and password for your instance. This is the “simple” setup and only requires those couple of items to generate your instance. Enter a password for your system. Configure the schedule for the virtual machine. This option allows you to define a specific date when the machine will shut down, or a schedule when it should be running. The virtual machine will suspend on the date you set. Click Next when you have set a run schedule, or a suspend date. After the process of creating the VM starts, you will be prompted to download your “Key Pair”. Make sure to download the "pem" file you will need this to ssh back to the created instance. It will take about 10-25 minutes for your VM to start. You can see your instance status by clicking on the INSTANCE tab of the Cloud Appliance Library main screen. Next lets spin up the VORA instance from SAP Cloud Appliance Library: On the left, click on SOLUTIONS to see the systems available for use. Search for "developer" in the search box to find the HANA Vora 1.2, developer edition. Walk through the wizard to spin up the Vora instance. Make sure to select the same AWS region as the SAP HANA instance as the two systems will need to communicate and you dont want to cross geo-boundaries. Remember the master password, i created same as the HANA installation. It is important that you click Download and store a file with a private key. You will use it to connect to the instance’s host using ssh client Once your instance of SAP HANA Vora is fully activated you can see it among your CAL’s Instances with Active status. You can see the 2 instances as well in your AWS account In the next article Part 2 we will explore how to Configure SAP HANA Vora HDP Ambari References: https://community.hortonworks.com/articles/27387/virtual-integration-of-hadoop-with-external-system.html https://community.hortonworks.com/content/kbentry/29928/using-spark-to-virtually-integrate-hadoop-with-ext.html http://help.sap.com/Download/Multimedia/hana_vora/SAP_HANA_Vora_Installation_Admin_Guide_en.pdf http://go.sap.com/developer/tutorials/hana-setup-cloud.html http://help.sap.com/hana_vora_re http://go.sap.com/developer/tutorials/vora-setup-cloud.html http://help.sap.com/Download/Multimedia/hana_vora/SAP_HANA_Vora_Installation_Admin_Guide_en.pdf
... View more
Labels:
09-22-2016
04:24 PM
1 Kudo
@Sunile Manjee it is not supported by HDP 2.5 , confirmed that yesterday with @vshukla
... View more
09-14-2016
06:38 PM
4 Kudos
LIVY is started on port 8998, just validated it on my HDP2.5 sandbox. [root@sandbox ~]# curl localhost:8998/sessions
{"from":0,"total":0,"sessions":[]}
... View more
09-14-2016
05:35 PM
1 Kudo
@Carlos Barichello HDP 2.5 has livy embedded , from a support perspective we dont officially support hitting LIVY directly, only through Zeppelin. That being said zeppelin uses the same rest api's to interact with livy.
... View more
09-13-2016
08:25 PM
1 Kudo
@Kirk Haslbeck Michael is correct you will get 5 total executors
... View more