Community Articles

azeltov · ‎02-02-2017

Create a HANA table for demo purposes first

CREATE COLUMN TABLE "CODEJAMMER"."STORE_ADDRESS" (
ID bigint not null primary key ,
"STREETNUMBER" INTEGER CS_INT,
   "STREET" NVARCHAR(200),
   "LOCALITY" NVARCHAR(200),
   "STATE" NVARCHAR(200),
   "COUNTRY" NVARCHAR(200)) UNLOAD PRIORITY 5 AUTO MERGE ;
   
insert into "CODEJAMMER"."STORE_ADDRESS" (ID,STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY)  values(1,555,'Madison Ave','New York','NY','America');
insert into "CODEJAMMER"."STORE_ADDRESS" (ID,STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY)  values(2,95,'Morten Street','New York','NY','USA');
insert into "CODEJAMMER"."STORE_ADDRESS" (ID,STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY)  values(3,2395,'Broadway Street','New York','NY','USA');

Configure the SAP JDBC Driver in Spark config to where the driver is saved on ALL the nodes . My example: /tmp/ngdbc.jar

You can also load the ngdbc.jar using the zeppelin depency if you dont want to system wide access:

z.reset() // clean up previously added artifact and repository
// add artifact from filesystem
z.load("/tmp/ngdbc.jar")

Check out zeppelin docs for zeppelin dependency loading details:

https://zeppelin.apache.org/docs/latest/interpreter/spark.html#3-dynamic-dependency-loading-via-spar...

Lets test it out ( the notebook is uploaded to github, see below):

Notebook: https://raw.githubusercontent.com/zeltovhorton/sap_hana_demo/master/SparkToHana.json

Zeppelin Notebook Viewer:

https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL3plbHRvdmh...

Code :

%spark
val url="jdbc:sap://54.234.139.2:30015/?currentschema=XXX"
val prop = new java.util.Properties
prop.setProperty("user","")
prop.setProperty("password","")
prop.setProperty("driver","com.sap.db.jdbc.Driver")
//hana table
val store_address = sqlContext.read.jdbc(url,"STORE_ADDRESS",prop) 
store_address.registerTempTable("store_address")
<strong> </strong>

%spark
//hive tables
val sales = sqlContext.sql("select storekey, productkey, salesamount from atlas_factsales limit 10")
sales.registerTempTable("sales")
%spark
sqlContext.sql("select s.salesamount, s.productkey, a.state, a.country from sales s inner join  store_address a where a.id in (1)").show()
+-----------+----------+-----+-------+
|salesamount|productkey|state|country|
+-----------+----------+-----+-------+
|     307.26|       177|   NY|America|
|     1490.0|      2180|   NY|America|
|     2299.9|      2329|   NY|America|
|    413.512|      1360|   NY|America|
|     6990.0|       193|   NY|America|
|    11184.3|      1412|   NY|America|
%sql
select s.salesamount, s.productkey, a.state, a.country from sales s inner join  store_address a where a.id in (1)

Cloudera Community

Community Articles

Federated SparkSQL using SAP HANA and HDP Hive using Zeppelin

Apache NiFi

Apache Spark

Apache Zeppelin

Cloudera DataFlow (CDF)