question Re: Please suggest what is best way to proceed with Hadoop/Spark POC in Archives of Support Questions (Read Only)

Please suggest what is best way to proceed with Hadoop/Spark POC

bhupendra — Tue, 22 Dec 2015 01:30:54 GMT

Re: Please suggest what is best way to proceed with Hadoop/Spark POC

nsabharwal — Tue, 22 Dec 2015 02:17:03 GMT

@Bhupendra Mishra

1) Get your cluster up - HDP 2.3.2 (latest version) (Sandbox is good start)

2) Get Zeppelin http://hortonworks.com/hadoop/zeppelin/#section_1

Step 2 will help you to configure spark and access data from Hive tables.

If you don't have the data then stick with this http://hortonworks.com/hadoop-tutorial/interacting...

Re: Please suggest what is best way to proceed with Hadoop/Spark POC

bhupendra — Tue, 22 Dec 2015 02:53:10 GMT

I want to proceed with distributed cluster. not standalone or sandbox

Full flashed Production grade server

Re: Please suggest what is best way to proceed with Hadoop/Spark POC

nsabharwal — Tue, 22 Dec 2015 02:56:34 GMT

@Bhupendra Mishra You are on the right track. http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Installing_HDP_AMB/content/ch_Getting_Ready.html

My email is nsabharwal@hortonworks.com

Please feel free to email me and we can discuss

Re: Please suggest what is best way to proceed with Hadoop/Spark POC

awatson — Tue, 22 Dec 2015 05:51:09 GMT

@Bhupendra Mishra

Depending on your hardware availability for the POC, I would also look at just doing the POC in the Cloud (e.g. MSFT Azure, AWS, GCP). You can leverage Cloudbreak to quickly deploy a fully fledge distributed cluster running Spark, Yarn, the whole nine yards, in the cloud in a matter of minutes.

Here is the documentation on how to do so:

Cloudbreak Overview - http://hortonworks.com/hadoop/cloudbreak/

Cloudbreak Docs - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-...