Member since
09-29-2015
155
Posts
205
Kudos Received
18
Solutions
02-02-2017
10:24 PM
2 Kudos
Create a HANA table for demo purposes first CREATE COLUMN TABLE "CODEJAMMER"."STORE_ADDRESS" (
ID bigint not null primary key ,
"STREETNUMBER" INTEGER CS_INT,
"STREET" NVARCHAR(200),
"LOCALITY" NVARCHAR(200),
"STATE" NVARCHAR(200),
"COUNTRY" NVARCHAR(200)) UNLOAD PRIORITY 5 AUTO MERGE ;
insert into "CODEJAMMER"."STORE_ADDRESS" (ID,STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY) values(1,555,'Madison Ave','New York','NY','America');
insert into "CODEJAMMER"."STORE_ADDRESS" (ID,STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY) values(2,95,'Morten Street','New York','NY','USA');
insert into "CODEJAMMER"."STORE_ADDRESS" (ID,STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY) values(3,2395,'Broadway Street','New York','NY','USA');
Configure the SAP JDBC Driver in Spark config to where the driver is saved on ALL the nodes . My example: /tmp/ngdbc.jar
You can also load the ngdbc.jar using the zeppelin depency if you dont want to system wide access: z.reset() // clean up previously added artifact and repository
// add artifact from filesystem
z.load("/tmp/ngdbc.jar") Check out zeppelin docs for zeppelin dependency loading details: https://zeppelin.apache.org/docs/latest/interpreter/spark.html#3-dynamic-dependency-loading-via-sparkdep-interpreter
Lets test it out ( the notebook is uploaded to github, see below):
Notebook: https://raw.githubusercontent.com/zeltovhorton/sap_hana_demo/master/SparkToHana.json
Zeppelin Notebook Viewer: https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL3plbHRvdmhvcnRvbi9zYXBfaGFuYV9kZW1vL21hc3Rlci9TcGFya1RvSGFuYS5qc29u
Code :
%spark
val url="jdbc:sap://54.234.139.2:30015/?currentschema=XXX"
val prop = new java.util.Properties
prop.setProperty("user","")
prop.setProperty("password","")
prop.setProperty("driver","com.sap.db.jdbc.Driver")
//hana table
val store_address = sqlContext.read.jdbc(url,"STORE_ADDRESS",prop)
store_address.registerTempTable("store_address")
<strong> </strong>
%spark
//hive tables
val sales = sqlContext.sql("select storekey, productkey, salesamount from atlas_factsales limit 10")
sales.registerTempTable("sales")
%spark
sqlContext.sql("select s.salesamount, s.productkey, a.state, a.country from sales s inner join store_address a where a.id in (1)").show()
+-----------+----------+-----+-------+
|salesamount|productkey|state|country|
+-----------+----------+-----+-------+
| 307.26| 177| NY|America|
| 1490.0| 2180| NY|America|
| 2299.9| 2329| NY|America|
| 413.512| 1360| NY|America|
| 6990.0| 193| NY|America|
| 11184.3| 1412| NY|America|
%sql
select s.salesamount, s.productkey, a.state, a.country from sales s inner join store_address a where a.id in (1)
... View more
02-02-2017
08:31 PM
3 Kudos
Prerequisites: * SAP HANA - Instructions to setup a Cloud HANA on AWS or Azure https://community.hortonworks.com/content/kbentry/58427/getting-started-with-sap-hana-and-vora-with-hdp-us.html * HDP 2.5.x We will use Spark shell, scala code and data frames to access HANA using JDBC driver. Start the spark shell with ngdbc.jar driver. spark-shell --master yarn-client --jars /tmp/ngdbc.jar scala> val url="jdbc:sap://xxxx:30015/?currentschema=CODEJAMMER"
url: String = jdbc:sap://xxxx:30015/?currentschema=CODEJAMMER
scala>
| val prop = new java.util.Properties
prop: java.util.Properties = {}
scala> prop.setProperty("user","xxxx")
res1: Object = null
scala> prop.setProperty("password","xxxx")
res2: Object = null
scala> prop.setProperty("driver","com.sap.db.jdbc.Driver")
res3: Object = null
scala>
scala> val emp_address = sqlContext.read.jdbc(url,"EMPLOYEE_ADDRESS",prop)
emp_address: org.apache.spark.sql.DataFrame = [ID: bigint, STREETNUMBER: int, STREET: string, LOCALITY: string, STATE: string, COUNTRY: string]
scala> emp_address.show
17/02/02 20:17:19 INFO SparkContext: Starting job: show at <console>:32
.....
17/02/02 20:17:23 INFO DAGScheduler: Job 0 finished: show at <console>:32, took 4.586219 s
+---+------------+---------------+--------+-----+-------+
| ID|STREETNUMBER| STREET|LOCALITY|STATE|COUNTRY|
+---+------------+---------------+--------+-----+-------+
| 1| 555| Madison Ave|New York| NY|America|
| 2| 95| Morten Street|New York| NY| USA|
| 3| 2395|Broadway Street|New York| NY| USA|
+---+------------+---------------+--------+-----+-------+
Gotchas: If you see this error: org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: com.sap.db.jdbc.topology.Host The issue is resolved with the latest SPS12+ driver. I had to upgrade my sap driver.
... View more
02-02-2017
04:46 PM
6 Kudos
Prerequisites:
* NIFI 1.0+ * SAP HANA - Instructions to setup a Cloud HANA on AWS or Azure: https://community.hortonworks.com/content/kbentry/58427/getting-started-with-sap-hana-and-vora-with-hdp-us.html
SETUP: HANA (Source Database) In this setup we will create a table in HANA table. First follow this HCC article "Demo data in SAP Vora Using Eclipse HANA Modelling tools - Part 3" You will need to download Eclipse Neon - Eclipse IDE for Java Developers to connect to the SAP HANA that we setup in Part 1 . After you setup eclipse we will need to configure Eclipse to install HANA Modelling tools that will allow us to connect to SAP HANA and execute sql scripts to setup demo data that we will use from SAP Vora. After you establish connection to your HANA system. Run this code: DROP TABLE "CODEJAMMER"."EMPLOYEE_ADDRESS";
CREATE COLUMN TABLE "CODEJAMMER"."EMPLOYEE_ADDRESS" (
ID bigint not null primary key generated by default as IDENTITY,
"STREETNUMBER" INTEGER CS_INT,
"STREET" NVARCHAR(200),
"LOCALITY" NVARCHAR(200),
"STATE" NVARCHAR(200),
"COUNTRY" NVARCHAR(200)) UNLOAD PRIORITY 5 AUTO MERGE ;
insert into "CODEJAMMER"."EMPLOYEE_ADDRESS" (STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY) values(555,'Madison Ave','New York','NY','America');
insert into "CODEJAMMER"."EMPLOYEE_ADDRESS" (STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY) values(95,'Morten Street','New York','NY','USA');
SELECT * FROM "CODEJAMMER"."EMPLOYEE_ADDRESS";
Now lets setup the NIFI workflow. Final result will look like this: Nifi Setup : This is a simple NIFI setup, the queryDatabase table processor is only available as part of default processors from version 0.6 of Nifi. Drop an instance of the QueryDatabaseTable processor on your canvas. Right click to configure and fill out the Required fields , plus you click on the error to setup the DB connection pool, see settings below: The limitation with this processor as it is not a true CDC and relies on one column. If the data is reloaded into the column with older data the data will not be replicated into HDFS or any other destination. This processor does not rely on Transactional logs or redo logs . Next configure the putHDFS processor configure the Hadoop Core-site.xml and hdfs-site.xml and destination HDFS directory Now lets start all the processors. Validate that you got data by checking provenance in your processor. You can also check what is the max id column state. The last auto increment ID will be displayed . Right click on QueryDatabaseTable processor and select View State: Testing CDC
Now insert a new record in HANA and validate that the record is in JSON format in HDFS insert into "CODEJAMMER"."EMPLOYEE_ADDRESS" (STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY) values(2395,'Broadway Street','New York','NY','USA');
Looking into HDFS we see the new JSON record:
... View more
Labels:
12-08-2016
06:30 PM
Nice Job! I did a similar article with a zeppelin + livy + AD/LDAP in case you want to check out livy steps: https://community.hortonworks.com/articles/65449/ow-to-setup-a-multi-user-active-directory-backed-z.html
... View more
11-15-2016
04:49 PM
6 Kudos
Pre-Requisites This is a continuation of of "How to setup a multi user (Active Directory backed) zeppelin integrated with ldap and using Livy Rest server" You should have a running HDP cluster, Zeppelin with Livy, Ranger. Tested on HDP 2.5. I will use 3 user accounts for demoing security: hadoopadmin, sales1, hr1. You can substitute accordingly to your environment. These users are synced from an LDAP/AD in my use case Spark and HDFS security We will Configure Ranger policies to:
Protect /sales HDFS dir - so only sales group has access to it Protect sales hive table - so only sales group has access to it Create /sales dir in HDFS as hadoopadmin Create /sales dir in HDFS as hadoopadmin sudo -u hadoopadmin hadoop fs -mkdir /sales
sudo -u hadoopadmin hadoop fs -chown sales:hadoop /sales
Now login as sales1 and attempt to access it before adding any Ranger HDFS policy sudo su - sales1
hdfs dfs -ls /sales
It fails with authorization error: Permission denied: user=sales1, access=READ_EXECUTE, inode="/sales":hadoopadmin:hdfs:d--------- Lets create a policy in ranger that will allow the user now access /sales directory Login into Ranger UI e.g. at http://RANGER_HOST_PUBLIC_IP:6080/index.html as admin/admin In Ranger, click on 'Audit' to open the Audits page and filter by below. Service Type: HDFS User: sales1 Notice that Ranger captured the access attempt and since there is currently no policy to allow the access, it was "Denied" Create an HDFS Policy in Ranger, follow below steps: On the 'Access Manager' tab click HDFS > (clustername)_hadoop Click 'Add New Policy' button to create a new one allowing sales group users access to /sales dir: Policy Name: sales dir Resource Path: /sales Group: sales Permissions : Execute Read Write Add Wait 30s for policy to take effect Now try accessing the dir again as sales1 and now there is no error seen hdfs dfs -ls /sales In Ranger, click on 'Audit' to open the Audits page and filter by below: Service Type: HDFS User: sales1 Notice that Ranger captured the access attempt and since this time there is a policy to allow the access, it was Allowed Now lets copy test data for sales user - run in command line as 'sales1' user <code>wget https://raw.githubusercontent.com/roberthryniewicz/datasets/master/airline-dataset/flights/flights.csv -O /tmp/flights.csv
# remove existing copies of dataset from HDFS
hadoop fs -rm -r -f /sales/airflightsdelays
# create directory on HDFS
hadoop fs -mkdir /sales/airflightsdelays
# put data into HDFS
hadoop fs -put /tmp/flights.csv /sales/airflightsdelays/
hadoop fs -cat /sales/airflightsdelays/flights.csv | head
You should see an output similar to : <code>[sales1@az1secure0 ~]$ wget https://raw.githubusercontent.com/roberthryniewicz/datasets/master/airline-dataset/flights/flights.csv -O /tmp/flights.csv
--2016-11-15 16:28:24-- https://raw.githubusercontent.com/roberthryniewicz/datasets/master/airline-dataset/flights/flights.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.52.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.52.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9719582 (9.3M) [text/plain]
Saving to: ‘/tmp/flights.csv’
100%[=========================================================================================>] 9,719,582 8.44MB/s in 1.1s
2016-11-15 16:28:26 (8.44 MB/s) - ‘/tmp/flights.csv’ saved [9719582/9719582]
[sales1@az1secure0 ~]$ # remove existing copies of dataset from HDFS
[sales1@az1secure0 ~]$ hadoop fs -rm -r -f /sales/airflightsdelays
16/11/15 16:28:28 INFO fs.TrashPolicyDefault: Moved: 'hdfs://az1secure0.field.hortonworks.com:8020/sales/airflightsdelays' to trash at: hdfs://az1secure0.field.hortonworks.com:8020/user/sales1/.Trash/Current/sales/airflightsdelays
[sales1@az1secure0 ~]$
[sales1@az1secure0 ~]$ # create directory on HDFS
[sales1@az1secure0 ~]$ hadoop fs -mkdir /sales/airflightsdelays
[sales1@az1secure0 ~]$
[sales1@az1secure0 ~]$ # put data into HDFS
[sales1@az1secure0 ~]$ hadoop fs -put /tmp/flights.csv /sales/airflightsdelays/
[sales1@az1secure0 ~]$
[sales1@az1secure0 ~]$ hadoop fs -cat /sales/airflightsdelays/flights.csv | head
Year,Month,DayofMonth,DayOfWeek,DepTime,CRSDepTime,ArrTime,CRSArrTime,UniqueCarrier,FlightNum,TailNum,ActualElapsedTime,CRSElapsedTime,AirTime,ArrDelay,DepDelay,Origin,Dest,Distance,TaxiIn,TaxiOut,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay
2008,1,3,4,2003,1955,2211,2225,WN,335,N712SW,128,150,116,-14,8,IAD,TPA,810,4,8,0,,0,NA,NA,NA,NA,NA
2008,1,3,4,754,735,1002,1000,WN,3231,N772SW,128,145,113,2,19,IAD,TPA,810,5,10,0,,0,NA,NA,NA,NA,NA
2008,1,3,4,628,620,804,750,WN,448,N428WN,96,90,76,14,8,IND,BWI,515,3,17,0,,0,NA,NA,NA,NA,NA
2008,1,3,4,926,930,1054,1100,WN,1746,N612SW,88,90,78,-6,-4,IND,BWI,515,3,7,0,,0,NA,NA,NA,NA,NA
2008,1,3,4,1829,1755,1959,1925,WN,3920,N464WN,90,90,77,34,34,IND,BWI,515,3,10,0,,0,2,0,0,0,32
2008,1,3,4,1940,1915,2121,2110,WN,378,N726SW,101,115,87,11,25,IND,JAX,688,4,10,0,,0,NA,NA,NA,NA,NA
2008,1,3,4,1937,1830,2037,1940,WN,509,N763SW,240,250,230,57,67,IND,LAS,1591,3,7,0,,0,10,0,0,0,47
2008,1,3,4,1039,1040,1132,1150,WN,535,N428WN,233,250,219,-18,-1,IND,LAS,1591,7,7,0,,0,NA,NA,NA,NA,NA
2008,1,3,4,617,615,652,650,WN,11,N689SW,95,95,70,2,2,IND,MCI,451,6,19,0,,0,NA,NA,NA,NA,NA
cat: Unable to write to output stream.
Now login as sales1 user in Zeppelin. Make sure livy is configured. <code>%livy.spark
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/sales/airflightsdelays/") // Read all flights
df.printSchema
// Show a subset of columns with "select"
df.select("UniqueCarrier", "FlightNum", "DepDelay", "ArrDelay", "Distance").show
Your output should look like this: <code>df: org.apache.spark.sql.DataFrame = [Year: int, Month: int, DayofMonth: int, DayOfWeek: int, DepTime: string, CRSDepTime: int, ArrTime: string, CRSArrTime: int, UniqueCarrier: string, FlightNum: int, TailNum: string, ActualElapsedTime: string, CRSElapsedTime: int, AirTime: string, ArrDelay: string, DepDelay: string, Origin: string, Dest: string, Distance: int, TaxiIn: string, TaxiOut: string, Cancelled: int, CancellationCode: string, Diverted: int, CarrierDelay: string, WeatherDelay: string, NASDelay: string, SecurityDelay: string, LateAircraftDelay: string]
+-------------+---------+--------+--------+--------+
|UniqueCarrier|FlightNum|DepDelay|ArrDelay|Distance|
+-------------+---------+--------+--------+--------+
| WN| 335| 8| -14| 810|
| WN| 3231| 19| 2| 810|
| WN| 448| 8| 14| 515|
| WN| 1746| -4| -6| 515|
....
Now open a new browser for another user session hr1. Create a new notebook as hr userr and re-run similar steps <code>%livy.spark
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/sales/airflightsdelays/") // Read all flights
df.printSchema
// Show a subset of columns with "select"
df.select("UniqueCarrier", "FlightNum", "DepDelay", "ArrDelay", "Distance").show
Your output should look like this: <code>org.apache.hadoop.security.AccessControlException: Permission denied: user=hr2, access=EXECUTE, inode="/sales/airflightsdelays":hadoopadmin:hdfs:d---------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
Lets login to YARN Resource Manager UI to confirm that the zeppelin sessions ran as the end user sales1 and hr2 : Stay tuned for part 2 where we will show how to Protect sales hive table - so only sales group has access to it
... View more
Labels:
11-08-2016
05:29 PM
6 Kudos
How to setup a multi user (Active Directory backed) zeppelin integrated with ldap and using Livy Rest server . Pre-requisites: Setup the LDAP/AD integration for ambari using this lab (Enable Active Directory Authentication for Ambari):https://github.com/HortonworksUniversity/Security_Labs#lab-1 If you are using self-signed certificate, Download the SSL certificate to where zeppelin is running <code>mkdir -p /etc/security/certificates
store the certificate in this directory Import certificate for zeppelin to work with the self signed certificate. <code>cd /etc/security/certificates
keytool -import -alias sampledcfieldcloud -file ad01.your.domain.name.cer -keystore /usr/jdk64/jdk1.8.0_77/jre/lib/security/cacerts
keytool -list -v -keystore /usr/jdk64/jdk1.8.0_77/jre/lib/security/cacerts | grep sampledcfieldcloud
Create home directory in hdfs for the user that you will login: <code>hdfs dfs -mkdir /user/hadoopadmin
hdfs dfs -chown hadoopadmin:hdfs /user/hadoopadmin
Enable multi-user zeppelin use ambari -> zeppelin notebook configs expand the Advanced zeppelin-env and look for shiro.ini entry. Below is configuration that works with our sampledcfield Cloud. <code>[users]
# List of users with their password allowed to access Zeppelin.
# To use a different strategy (LDAP / Database / ...) check the shiro doc at http://shiro.apache.org/configuration.html#Configuration-INISections
#admin = password1
#user1 = password2, role1, role2
#user2 = password3, role3
#user3 = password4, role2
# Sample LDAP configuration, for user Authentication, currently tested for single Realm
[main]
activeDirectoryRealm = org.apache.zeppelin.server.ActiveDirectoryGroupRealm
#activeDirectoryRealm.systemUsername = CN=binduser,OU=ServiceUsers,DC=sampledcfield,DC=hortonworks,DC=com
activeDirectoryRealm.systemUsername = binduser
activeDirectoryRealm.systemPassword = xxxxxx
activeDirectoryRealm.principalSuffix = @your.domain.name
#activeDirectoryRealm.hadoopSecurityCredentialPath = jceks://user/zeppelin/zeppelin.jceks
activeDirectoryRealm.searchBase = DC=sampledcfield,DC=hortonworks,DC=com
activeDirectoryRealm.url = ldaps://ad01.your.domain.name:636
activeDirectoryRealm.groupRolesMap = "CN=hadoop-admins,OU=CorpUsers,DC=sampledcfield,DC=hortonworks,DC=com":"admin"
activeDirectoryRealm.authorizationCachingEnabled = true
sessionManager = org.apache.shiro.web.session.mgt.DefaultWebSessionManager
cacheManager = org.apache.shiro.cache.MemoryConstrainedCacheManager
securityManager.cacheManager = $cacheManager
securityManager.sessionManager = $sessionManager
securityManager.sessionManager.globalSessionTimeout = 86400000
#ldapRealm = org.apache.shiro.realm.ldap.JndiLdapRealm
#ldapRealm.userDnTemplate = uid={0},cn=users,cn=accounts,dc=example,dc=com
#ldapRealm.contextFactory.url = ldap://ldaphost:389
#ldapRealm.contextFactory.authenticationMechanism = SIMPLE
#sessionManager = org.apache.shiro.web.session.mgt.DefaultWebSessionManager
#securityManager.sessionManager = $sessionManager
# 86,400,000 milliseconds = 24 hour
#securityManager.sessionManager.globalSessionTimeout = 86400000
shiro.loginUrl = /api/login
[roles]
admin = *
[urls]
# anon means the access is anonymous.
# authcBasic means Basic Auth Security
# To enfore security, comment the line below and uncomment the next one
/api/version = anon
/api/interpreter/** = authc, roles[admin]
/api/credential/** = authc, roles[admin]
/api/configurations/** = authc, roles[admin]
#/** = anon
/** = authc
#/** = authcBasic
Grant Livy ability to impersonate Use Ambari to update core-site.xml, restart YARN & HDFS after making this change. <code><property>
<name>hadoop.proxyuser.livy.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.livy.hosts</name>
<value>*</value>
</property>
Restart hdfs and yarn after this update. After running the livy notebook make sure the yarn logs show the logged in user as the user that is running, hadoopadmin is the user that is logged in the zeppelin notebook. You should see 2 applications running the livy-session-X and the zeppelin app running in yarn <code>application_1478287338271_0003 hadoopadmin livy-session-0
application_1478287338271_0002 zeppelin Zeppelin
Troubleshooting, explore zeppelin and livy log files: <code>tail -f /var/log/zeppelin/zeppelin-zeppelin-az1secure0.log
tail -f /var/log/zeppelin/zeppelin-interpreter-livy-zeppelin-az1secure0.log
Next Steps: This multi-part article shows how to Secure Spark with Ranger using Zeppelin and Livy for Multi-user access Securing Spark with Ranger using Zeppelin and Livy for Multi-user access - Part 1 References: https://zeppelin.apache.org/docs/0.6.0/interpreter/livy.html#faqhttp://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2-trunk/bk_command-line-installation/content/ch21s07s02.html http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2-trunk/bk_command-line-installation/content/configuring_zep.html
... View more
Labels:
09-26-2016
09:31 PM
4 Kudos
This article "Perform Data Analysis using SAP Vora on SAP Hana data - Part 4" is continuation of "Load Demo data in SAP Vora Using Eclipse HANA Modelling tools - Part 3" Log back in to SAP Cloud Appliance Library - the free service to manage your SAP solutions in the public cloud. You should have HANA and Vora instances up and running:
Open Apache Zeppelin web UI click on Connect in your SAP HANA Vora instance in CAL, and then pick Open a link for Application: Zeppelin
In Zeppelin click Create new note, name the notebook as you like ex "interact with hana data" In the first cell lets define our imports and HANA connection information import sqlContext.implicits._
import scala.collection.mutable.Map
val HANA_HOSTNAME = "xxx.xxx.xx.xxx"
val HANA_INSTANCE = "00"
val HANA_SCHEMA = "CODEJAMMER"
val HANA_USERNAME = "CODEJAMMER"
val HANA_PASSWORD = "CodeJam2016"
4. In the next cell we will use sqlContext that was automatically created by Zeppelin to Connect to HANA using "com.sap.spark.hana" sqlContext.sql( s"""
CREATE TABLE EMPLOYEE_ADDRESS
USING
com.sap.spark.hana
OPTIONS (
path "EMPLOYEE_ADDRESS",
host "${HANA_HOSTNAME}",
dbschema "${HANA_SCHEMA}",
user "${HANA_USERNAME}",
passwd "${HANA_PASSWORD}",
instance "${HANA_INSTANCE}"
)
""".stripMargin )
5. Now lets query the HANA database from Zeppelin sqlContext.sql("select * from EMPLOYEE_ADDRESS").show You should get the results from the EMPLOYEE_ADDRESS table we created earlier. +------------+-------------+--------+-----+-------+
|STREETNUMBER| STREET|LOCALITY|STATE|COUNTRY|
+------------+-------------+--------+-----+-------+
| 555| Madison Ave|New York| NY|America|
| 95|Morten Street|New York| NY| USA|
+------------+-------------+--------+-----+-------+
... View more
Labels:
09-26-2016
06:30 PM
4 Kudos
This article "Load Demo data in SAP Vora Using Eclipse HANA Modelling tools - Part 3" is continuation of "Configure SAP Vora HDP Ambari - Part 2" You will need to download Eclipse Neon - Eclipse IDE for Java Developers to connect to the SAP HANA that we setup in Part 1 . After you setup eclipse we will need to configure Eclipse to install HANA Modelling tools that will allow us to connect to SAP HANA and execute sql scripts to setup demo data that we will use from SAP Vora. Eclipse Setup Procedure
Open the Eclipse IDE. In the main menu, choose HelpInstall New Software . Depending on the Eclipse version you have installed, enter one of the following URLs in the Work with field:
For Eclipse Neon (4.6), add URL: https://tools.hana.ondemand.com/neon Select SAP HANA Tools (the whole feature group).noteIn case you need to develop with SAPUI5, install also SAP HANA Cloud Platform ToolsUI development toolkit for HTML5 (Developer Edition) . Choose Next. On the next wizard page, you get an overview of the features to be installed. Choose Next. Confirm the license agreements. Choose Finish to start the installation. After the successful installation, you will be prompted to restart your Eclipse IDE. Log back in to SAP Cloud Appliance Library - the free service to manage your SAP solutions in the public cloud. You should have HANA and Vora instances up and running. Click on SAP HANA Connect link and click open: You should see the HANA home page for your instance: You will see a warning for 'hosts' file please follow the steps to setup your hosts entry in your local sytem, this will simplify your life.
Open your Terminal or Shell application and type the commandsudo nano /etc/hosts Add the following line or modify it if you already have it.
xxx.xxx.xxx.xxx vhcalhdbdb You will need the server connectivity and CODEJAMMER information for steps in setting up eclipse. Next lets create a new HOST in eclipse HANA tools. CLICK ADD SYSTEM... Now you will need to add the hostname that you got from HANA home page above, instance is 00, connect using Role: Developer User: CODEJAMMER
Password: CodeJam2016 You should have a successful connection like this. Right click on the new system you setup and opensql console and run the following code: DROP TABLE "CODEJAMMER"."EMPLOYEE_ADDRESS";
CREATE COLUMN TABLE "CODEJAMMER"."EMPLOYEE_ADDRESS" ("STREETNUMBER" INTEGER CS_INT,
"STREET" NVARCHAR(200),
"LOCALITY" NVARCHAR(200),
"STATE" NVARCHAR(200),
"COUNTRY" NVARCHAR(200)) UNLOAD PRIORITY 5 AUTO MERGE ;
insert into "CODEJAMMER"."EMPLOYEE_ADDRESS" values(555,'Madison Ave','New York','NY','America');
insert into "CODEJAMMER"."EMPLOYEE_ADDRESS" values(95,'Morten Street','New York','NY','USA');
Next lets test to make sure you got data , run the following sql command: SELECT * FROM "CODEJAMMER"."EMPLOYEE_ADDRESS";
You should get a result like this: Congrats you create an "EMPLOYEE_ADDRESS" table and populated with sample data. Next article will load this sample data from the Apache Zeppelin. Perform Data Analysis using SAP Vora on SAP Hana data - Part 4 References: http://go.sap.com/developer/tutorials/hana-web-development-workbench.html https://help.hana.ondemand.com/help/frameset.htm?b0e351ada628458cb8906f55bcac4755.html https://community.hortonworks.com/articles/27387/virtual-integration-of-hadoop-with-external-system.html https://community.hortonworks.com/content/kbentry/29928/using-spark-to-virtually-integrate-hadoop-with-ext.html http://help.sap.com/Download/Multimedia/hana_vora/SAP_HANA_Vora_Installation_Admin_Guide_en.pdf http://go.sap.com/developer/tutorials/hana-setup-cloud.html http://help.sap.com/hana_vora_re http://go.sap.com/developer/tutorials/vora-setup-cloud.html http://go.sap.com/developer/tutorials/vora-connect.html http://help.sap.com/Download/Multimedia/hana_vora/SAP_HANA_Vora_Installation_Admin_Guide_en.pdf
... View more
Labels:
09-26-2016
05:03 PM
6 Kudos
This article "Configure SAP Vora HDP Ambari - Part 2" is continuation of "Getting started with SAP
Hana and Vora with HDP using Apache Zeppelin for Data Analysis - Part 1
Intro" Log back in to SAP Cloud Appliance Library - the free service to manage your SAP solutions in the public cloud. You should have HANA and Vora instances up and running: Open Apache Ambari web UI click on Connect in your SAP HANA Vora instance in CAL, and then pick Open a link for Application: Ambari .
The port of Ambari web UI has been preconfigured for you in the SAP HANA Vora, developer edition, in CAL. As well its port has been opened as one of the default Access Points. As you might remember it translates into the appropriate inbound rule in the corresponding AWS’s security group. Log into Ambari web UI using the user admin and the master password you defined during process of the creation of the instance in CAL. You can see that (1) all services, including SAP HANA Vora components, are running, that (2) there are no issues with resources, and that (3) there are no alerts generated by the the system.
You use this interface to start/stop cluster components if needed during operations or troubleshooting. Please refer to Apache Ambari official documentation if you need additional information and training how to use it. For detailed review of all SAP HANA Vora components and their purpose please review SAP HANA Vora help We will need to make some configuration to get the HDFS View to work in Ambari and also modify Yarn scheduler. Setup HDFS
Ambari View: Creating and Configuring a Files View Instance Browse to the Ambari Administration interface. Click Views, expand the Files View, and click Create Instance.
Enter the following View instance Details: Property Description Value Instance Name This is the Files view instance name. This value should be unique for all Files view instances you create. This value cannot contain spaces and is required. HDFS Display Name This is the name of the view link displayed to the user in Ambari Web. MyFiles Description This is the description of the view displayed to the user in Ambari Web. Browse HDFS files and directories. Visible This checkbox determines whether the view is displayed to users in Ambari Web. Visible or Not Visible
You should see the an ambari HDFS view like this: Next
In Ambari Web, browse to Services > HDFS > Configs. Under the Advanced tab, navigate
to the Custom core-site section. Click Add
Property…
to add the following custom properties:hadoop.proxyuser.root.groups=* hadoop.proxyuser.root.hosts=* Now lets test that you can view the HDFS View: Next we will reconfigure the Yarn to fix an issue when submitting yarn jobs. I got this when running a sqoop job to import data from SAP HANA to HDFS ( this will be a separate how-to article published soon) YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.been stuck like that for a while Lets set yarn.scheduler.capacity.maximum-am-resource-percent=0.6 . Go to YARN -> Configs and look for property yarn.scheduler.capacity.maximum-am-resource-percent https://hadoop.apache.org/docs/r0.23.11/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html yarn.scheduler.capacity.maximum-am-resource-percent /yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent Maximum percent of resources in the cluster which can be used to run application masters - controls number of concurrent active applications. Limits on each queue are directly proportional to their queue capacities and user limits. Specified as a float - ie 0.5 = 50%. Default is 10%. This can be set for all queues with yarn.scheduler.capacity.maximum-am-resource-percent and can also be overridden on a per queue basis by settingyarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent Now lets connect to Apache Zeppelin and load sample data from files already created in HDFS in SAP HANA Vora Apache Zeppelin is a web-based notebook that enables interactive data analytics. multi-purposed web-based notebook which brings data ingestion, data exploration, visualization, sharing and collaboration features to Hadoop and Spark. SAP HANA Vora provides its own %vora interpreter, which allows Spark/Vora features to be used from Zeppelin. Zeppelin allows queries to be written directly in Spark SQL https://hortonworks.com/apache/zeppelin/ SAP HANA Vora, developer edition, on CAL comes with Apache Zeppelin pre-installed. Similar to opening Apache Ambari to open Zeppelin web UI click on Connect in your SAP HANA Vora instance in CAL, and then pick Open a link for Application: Zeppelin .
Zeppelin opens up in a new browser window, check it is Connected and if yes, then click on 0_DemoData notebook.
The 0_DemoData notebook will open up. Now you can click on Run all paragraphsbutton on top of the page to create tables in SAP HANA Vora using data from the existing HDFS files preloaded on the instance in CAL. These are the tables you will need as well later in exercises.
A dialog window will pop up asking you to confirm to Run all paragraphs? Click OK The Vora code will load .csv files and create tables in Vora Spark. You can navigate to the hdfs files using the created view earlier to preview the data right on HDFS: At this point we setup an Ambari HDFS view to browse our distributed files system on HDP and tested the Vora connectivity to HDFS that everything is working. Stay tuned for the next article "How to connect SAP Vora to SAP HANA using Apache Zeppelin", where we will now use the Apache Zeppelin to connect to the SAP HANA system in part 1 References: https://community.hortonworks.com/articles/27387/virtual-integration-of-hadoop-with-external-system.html https://community.hortonworks.com/content/kbentry/29928/using-spark-to-virtually-integrate-hadoop-with-ext.html http://help.sap.com/Download/Multimedia/hana_vora/SAP_HANA_Vora_Installation_Admin_Guide_en.pdf http://go.sap.com/developer/tutorials/hana-setup-cloud.html http://help.sap.com/hana_vora_re http://go.sap.com/developer/tutorials/vora-setup-cloud.html http://go.sap.com/developer/tutorials/vora-connect.html http://help.sap.com/Download/Multimedia/hana_vora/SAP_HANA_Vora_Installation_Admin_Guide_en.pdf
... View more
Labels: