Member since
09-29-2015
155
Posts
205
Kudos Received
18
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5821 | 02-17-2017 12:38 PM | |
790 | 11-15-2016 03:56 PM | |
1107 | 11-11-2016 05:27 PM | |
10444 | 11-11-2016 12:16 AM | |
1871 | 11-10-2016 06:15 PM |
03-09-2017
03:02 PM
@vshukla @Bikas
... View more
02-17-2017
12:38 PM
I ended up looking the livy.log file for the actual error message. My user that i was trying to run as did not have a home directory in HDFS. So created a home dir with proper permissions and got it working.
... View more
02-02-2017
10:51 PM
Repo Description Notebook for the demo: Federated SparkSQL using SAP HANA and HDP Hive using Zeppelin. Repo Info Github Repo URL https://github.com/zeltovhorton/sap_hana_demo Github account name zeltovhorton Repo name sap_hana_demo
... View more
Labels:
02-02-2017
10:24 PM
2 Kudos
Create a HANA table for demo purposes first CREATE COLUMN TABLE "CODEJAMMER"."STORE_ADDRESS" (
ID bigint not null primary key ,
"STREETNUMBER" INTEGER CS_INT,
"STREET" NVARCHAR(200),
"LOCALITY" NVARCHAR(200),
"STATE" NVARCHAR(200),
"COUNTRY" NVARCHAR(200)) UNLOAD PRIORITY 5 AUTO MERGE ;
insert into "CODEJAMMER"."STORE_ADDRESS" (ID,STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY) values(1,555,'Madison Ave','New York','NY','America');
insert into "CODEJAMMER"."STORE_ADDRESS" (ID,STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY) values(2,95,'Morten Street','New York','NY','USA');
insert into "CODEJAMMER"."STORE_ADDRESS" (ID,STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY) values(3,2395,'Broadway Street','New York','NY','USA');
Configure the SAP JDBC Driver in Spark config to where the driver is saved on ALL the nodes . My example: /tmp/ngdbc.jar
You can also load the ngdbc.jar using the zeppelin depency if you dont want to system wide access: z.reset() // clean up previously added artifact and repository
// add artifact from filesystem
z.load("/tmp/ngdbc.jar") Check out zeppelin docs for zeppelin dependency loading details: https://zeppelin.apache.org/docs/latest/interpreter/spark.html#3-dynamic-dependency-loading-via-sparkdep-interpreter
Lets test it out ( the notebook is uploaded to github, see below):
Notebook: https://raw.githubusercontent.com/zeltovhorton/sap_hana_demo/master/SparkToHana.json
Zeppelin Notebook Viewer: https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL3plbHRvdmhvcnRvbi9zYXBfaGFuYV9kZW1vL21hc3Rlci9TcGFya1RvSGFuYS5qc29u
Code :
%spark
val url="jdbc:sap://54.234.139.2:30015/?currentschema=XXX"
val prop = new java.util.Properties
prop.setProperty("user","")
prop.setProperty("password","")
prop.setProperty("driver","com.sap.db.jdbc.Driver")
//hana table
val store_address = sqlContext.read.jdbc(url,"STORE_ADDRESS",prop)
store_address.registerTempTable("store_address")
<strong> </strong>
%spark
//hive tables
val sales = sqlContext.sql("select storekey, productkey, salesamount from atlas_factsales limit 10")
sales.registerTempTable("sales")
%spark
sqlContext.sql("select s.salesamount, s.productkey, a.state, a.country from sales s inner join store_address a where a.id in (1)").show()
+-----------+----------+-----+-------+
|salesamount|productkey|state|country|
+-----------+----------+-----+-------+
| 307.26| 177| NY|America|
| 1490.0| 2180| NY|America|
| 2299.9| 2329| NY|America|
| 413.512| 1360| NY|America|
| 6990.0| 193| NY|America|
| 11184.3| 1412| NY|America|
%sql
select s.salesamount, s.productkey, a.state, a.country from sales s inner join store_address a where a.id in (1)
... View more
02-02-2017
08:31 PM
3 Kudos
Prerequisites: * SAP HANA - Instructions to setup a Cloud HANA on AWS or Azure https://community.hortonworks.com/content/kbentry/58427/getting-started-with-sap-hana-and-vora-with-hdp-us.html * HDP 2.5.x We will use Spark shell, scala code and data frames to access HANA using JDBC driver. Start the spark shell with ngdbc.jar driver. spark-shell --master yarn-client --jars /tmp/ngdbc.jar scala> val url="jdbc:sap://xxxx:30015/?currentschema=CODEJAMMER"
url: String = jdbc:sap://xxxx:30015/?currentschema=CODEJAMMER
scala>
| val prop = new java.util.Properties
prop: java.util.Properties = {}
scala> prop.setProperty("user","xxxx")
res1: Object = null
scala> prop.setProperty("password","xxxx")
res2: Object = null
scala> prop.setProperty("driver","com.sap.db.jdbc.Driver")
res3: Object = null
scala>
scala> val emp_address = sqlContext.read.jdbc(url,"EMPLOYEE_ADDRESS",prop)
emp_address: org.apache.spark.sql.DataFrame = [ID: bigint, STREETNUMBER: int, STREET: string, LOCALITY: string, STATE: string, COUNTRY: string]
scala> emp_address.show
17/02/02 20:17:19 INFO SparkContext: Starting job: show at <console>:32
.....
17/02/02 20:17:23 INFO DAGScheduler: Job 0 finished: show at <console>:32, took 4.586219 s
+---+------------+---------------+--------+-----+-------+
| ID|STREETNUMBER| STREET|LOCALITY|STATE|COUNTRY|
+---+------------+---------------+--------+-----+-------+
| 1| 555| Madison Ave|New York| NY|America|
| 2| 95| Morten Street|New York| NY| USA|
| 3| 2395|Broadway Street|New York| NY| USA|
+---+------------+---------------+--------+-----+-------+
Gotchas: If you see this error: org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: com.sap.db.jdbc.topology.Host The issue is resolved with the latest SPS12+ driver. I had to upgrade my sap driver.
... View more
02-02-2017
04:46 PM
6 Kudos
Prerequisites:
* NIFI 1.0+ * SAP HANA - Instructions to setup a Cloud HANA on AWS or Azure: https://community.hortonworks.com/content/kbentry/58427/getting-started-with-sap-hana-and-vora-with-hdp-us.html
SETUP: HANA (Source Database) In this setup we will create a table in HANA table. First follow this HCC article "Demo data in SAP Vora Using Eclipse HANA Modelling tools - Part 3" You will need to download Eclipse Neon - Eclipse IDE for Java Developers to connect to the SAP HANA that we setup in Part 1 . After you setup eclipse we will need to configure Eclipse to install HANA Modelling tools that will allow us to connect to SAP HANA and execute sql scripts to setup demo data that we will use from SAP Vora. After you establish connection to your HANA system. Run this code: DROP TABLE "CODEJAMMER"."EMPLOYEE_ADDRESS";
CREATE COLUMN TABLE "CODEJAMMER"."EMPLOYEE_ADDRESS" (
ID bigint not null primary key generated by default as IDENTITY,
"STREETNUMBER" INTEGER CS_INT,
"STREET" NVARCHAR(200),
"LOCALITY" NVARCHAR(200),
"STATE" NVARCHAR(200),
"COUNTRY" NVARCHAR(200)) UNLOAD PRIORITY 5 AUTO MERGE ;
insert into "CODEJAMMER"."EMPLOYEE_ADDRESS" (STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY) values(555,'Madison Ave','New York','NY','America');
insert into "CODEJAMMER"."EMPLOYEE_ADDRESS" (STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY) values(95,'Morten Street','New York','NY','USA');
SELECT * FROM "CODEJAMMER"."EMPLOYEE_ADDRESS";
Now lets setup the NIFI workflow. Final result will look like this: Nifi Setup : This is a simple NIFI setup, the queryDatabase table processor is only available as part of default processors from version 0.6 of Nifi. Drop an instance of the QueryDatabaseTable processor on your canvas. Right click to configure and fill out the Required fields , plus you click on the error to setup the DB connection pool, see settings below: The limitation with this processor as it is not a true CDC and relies on one column. If the data is reloaded into the column with older data the data will not be replicated into HDFS or any other destination. This processor does not rely on Transactional logs or redo logs . Next configure the putHDFS processor configure the Hadoop Core-site.xml and hdfs-site.xml and destination HDFS directory Now lets start all the processors. Validate that you got data by checking provenance in your processor. You can also check what is the max id column state. The last auto increment ID will be displayed . Right click on QueryDatabaseTable processor and select View State: Testing CDC
Now insert a new record in HANA and validate that the record is in JSON format in HDFS insert into "CODEJAMMER"."EMPLOYEE_ADDRESS" (STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY) values(2395,'Broadway Street','New York','NY','USA');
Looking into HDFS we see the new JSON record:
... View more
Labels:
01-24-2017
01:52 AM
@Tom seeing the same thing for my spark code:
com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
at [Source: {"id":"0","name":"parallelize"}; line: 1, column: 1]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
I tried adding dependency but had no luck either. z.load("com.fasterxml.jackson.cores:jackson-databind:2.4.4")
... View more
12-08-2016
06:30 PM
Nice Job! I did a similar article with a zeppelin + livy + AD/LDAP in case you want to check out livy steps: https://community.hortonworks.com/articles/65449/ow-to-setup-a-multi-user-active-directory-backed-z.html
... View more
11-22-2016
03:49 PM
@Sebastian can you take a screenshot or copy and paste your interpreter settings pls?
... View more