Support Questions

Find answers, ask questions, and share your expertise

Please suggest what is best way to proceed with Hadoop/Spark POC

avatar
Contributor
 
1 ACCEPTED SOLUTION

avatar
@Bhupendra Mishra

Depending on your hardware availability for the POC, I would also look at just doing the POC in the Cloud (e.g. MSFT Azure, AWS, GCP). You can leverage Cloudbreak to quickly deploy a fully fledge distributed cluster running Spark, Yarn, the whole nine yards, in the cloud in a matter of minutes.

Here is the documentation on how to do so:

Cloudbreak Overview - http://hortonworks.com/hadoop/cloudbreak/

Cloudbreak Docs - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-...

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@Bhupendra Mishra

1) Get your cluster up - HDP 2.3.2 (latest version) (Sandbox is good start)

2) Get Zeppelin http://hortonworks.com/hadoop/zeppelin/#section_1

Step 2 will help you to configure spark and access data from Hive tables.

If you don't have the data then stick with this http://hortonworks.com/hadoop-tutorial/interacting...

avatar
Contributor

I want to proceed with distributed cluster. not standalone or sandbox

Full flashed Production grade server

avatar
Master Mentor

@Bhupendra Mishra You are on the right track. http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Installing_HDP_AMB/content/ch_Getting_Rea...

My email is nsabharwal@hortonworks.com

Please feel free to email me and we can discuss

avatar
@Bhupendra Mishra

Depending on your hardware availability for the POC, I would also look at just doing the POC in the Cloud (e.g. MSFT Azure, AWS, GCP). You can leverage Cloudbreak to quickly deploy a fully fledge distributed cluster running Spark, Yarn, the whole nine yards, in the cloud in a matter of minutes.

Here is the documentation on how to do so:

Cloudbreak Overview - http://hortonworks.com/hadoop/cloudbreak/

Cloudbreak Docs - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-...