Support Questions

rkanchu · ‎03-04-2016

Hi,

I am using HDP 2.3.4 distributed cluster and would like to integrate SAS for the analytics. Is there any standard documentation which is followed for this setup?

Regards,

Krishna

nsabharwal · ‎03-05-2016

@rkanchu

Please see this from technical side https://community.hortonworks.com/articles/4689/getting-started-with-sas-and-hadoop.html

1) Leveraging HDFS for flat files using the SAS Filename Statement

The SAS Filename statement allows a SAS programmer to setup a pointer to an inbound or outbound filesystem directory. With Hadoop, the SAS Filename statement can reference an HDFS directory. Once the file reference is established, this file reference can be used within a SAS Data Step on an Infile or File statement. This enables SAS programmers to read and write flat files to and from HDFS inline within their programs.

2) Leveraging HDFS for SAS Libraries using the Libname Statement

SAS also implemented the SPDE engine on the Libname statement to support leveraging HDFS to store SAS tables or data sets. Once a library reference is established leveraging the Libname statement, SAS programmers can use this libref on a Data or Set statement within a SAS Data Step, or as input to a SAS procedure. There are minor limitations to leveraging this method over a standard file system for SAS libraries. SAS documentation will provide these details.

3) Accessing directly, HiveServer2

SAS had implemented a Libname statement to setup a SAS library reference to HiveServer2. It is available for mostly read access to Hive tables. Once a SAS library reference has been established (this leverages a JDBC connection), SAS programmers can leverage HiveServer2 tables from within their SAS programs, as input to a SET statement or on a DATA= statement within a SAS procedure. SAS has implemented, a dynamic Push Down In Database capabilties to take standard Statistical procedures like Proc Summary, Means, Freq used by SAS programmers with HiveServer2. This capability will generate a complex HiveQL statement for the users and send this over to HiveServer2 for execution. This allows a significant portion of the math to take place in Hadoop.

4) Executing HDFS, Pig, Hive, and MapReduce inline within a SAS program

SAS created Proc Hadoop, a procedure available with this product, to enable SAS programmers to execute, inline within a SAS program, any HDFS, Pig, Hive, or MapReduce script or program that has been created outside of SAS.

I hope you find this information useful as you get stated using SAS Access to Hadoop.

View solution in original post

aervits · ‎03-04-2016

@rkanchu

everything SAS related is here http://hortonworks.com/partner/sas/

nsabharwal · ‎03-05-2016

@rkanchu

Please see this from technical side https://community.hortonworks.com/articles/4689/getting-started-with-sas-and-hadoop.html

1) Leveraging HDFS for flat files using the SAS Filename Statement

The SAS Filename statement allows a SAS programmer to setup a pointer to an inbound or outbound filesystem directory. With Hadoop, the SAS Filename statement can reference an HDFS directory. Once the file reference is established, this file reference can be used within a SAS Data Step on an Infile or File statement. This enables SAS programmers to read and write flat files to and from HDFS inline within their programs.

2) Leveraging HDFS for SAS Libraries using the Libname Statement

SAS also implemented the SPDE engine on the Libname statement to support leveraging HDFS to store SAS tables or data sets. Once a library reference is established leveraging the Libname statement, SAS programmers can use this libref on a Data or Set statement within a SAS Data Step, or as input to a SAS procedure. There are minor limitations to leveraging this method over a standard file system for SAS libraries. SAS documentation will provide these details.

3) Accessing directly, HiveServer2

SAS had implemented a Libname statement to setup a SAS library reference to HiveServer2. It is available for mostly read access to Hive tables. Once a SAS library reference has been established (this leverages a JDBC connection), SAS programmers can leverage HiveServer2 tables from within their SAS programs, as input to a SET statement or on a DATA= statement within a SAS procedure. SAS has implemented, a dynamic Push Down In Database capabilties to take standard Statistical procedures like Proc Summary, Means, Freq used by SAS programmers with HiveServer2. This capability will generate a complex HiveQL statement for the users and send this over to HiveServer2 for execution. This allows a significant portion of the math to take place in Hadoop.

4) Executing HDFS, Pig, Hive, and MapReduce inline within a SAS program

SAS created Proc Hadoop, a procedure available with this product, to enable SAS programmers to execute, inline within a SAS program, any HDFS, Pig, Hive, or MapReduce script or program that has been created outside of SAS.

I hope you find this information useful as you get stated using SAS Access to Hadoop.

rkanchu · ‎03-05-2016

Thanks Neeraj,

That was helpful.

Cloudera Community

Support Questions

SAS and HDP 2.3.4 integration documentation