Created on 02-02-2017 08:31 PM
Prerequisites:
* SAP HANA - Instructions to setup a Cloud HANA on AWS or Azure
https://community.hortonworks.com/content/kbentry/58427/getting-started-with-sap-hana-and-vora-with-...
* HDP 2.5.x
We will use Spark shell, scala code and data frames to access HANA using JDBC driver.
Start the spark shell with ngdbc.jar driver.
spark-shell --master yarn-client --jars /tmp/ngdbc.jar
scala> val url="jdbc:sap://xxxx:30015/?currentschema=CODEJAMMER" url: String = jdbc:sap://xxxx:30015/?currentschema=CODEJAMMER scala> | val prop = new java.util.Properties prop: java.util.Properties = {} scala> prop.setProperty("user","xxxx") res1: Object = null scala> prop.setProperty("password","xxxx") res2: Object = null scala> prop.setProperty("driver","com.sap.db.jdbc.Driver") res3: Object = null scala> scala> val emp_address = sqlContext.read.jdbc(url,"EMPLOYEE_ADDRESS",prop) emp_address: org.apache.spark.sql.DataFrame = [ID: bigint, STREETNUMBER: int, STREET: string, LOCALITY: string, STATE: string, COUNTRY: string] scala> emp_address.show 17/02/02 20:17:19 INFO SparkContext: Starting job: show at <console>:32 ..... 17/02/02 20:17:23 INFO DAGScheduler: Job 0 finished: show at <console>:32, took 4.586219 s +---+------------+---------------+--------+-----+-------+ | ID|STREETNUMBER| STREET|LOCALITY|STATE|COUNTRY| +---+------------+---------------+--------+-----+-------+ | 1| 555| Madison Ave|New York| NY|America| | 2| 95| Morten Street|New York| NY| USA| | 3| 2395|Broadway Street|New York| NY| USA| +---+------------+---------------+--------+-----+-------+
Gotchas:
If you see this error:
org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: com.sap.db.jdbc.topology.Host
The issue is resolved with the latest SPS12+ driver. I had to upgrade my sap driver.
User | Count |
---|---|
763 | |
379 | |
316 | |
309 | |
270 |