Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Please suggest what is best way to proceed with Hadoop/Spark POC

avatar
Contributor
 
1 ACCEPTED SOLUTION

avatar
@Bhupendra Mishra

Depending on your hardware availability for the POC, I would also look at just doing the POC in the Cloud (e.g. MSFT Azure, AWS, GCP). You can leverage Cloudbreak to quickly deploy a fully fledge distributed cluster running Spark, Yarn, the whole nine yards, in the cloud in a matter of minutes.

Here is the documentation on how to do so:

Cloudbreak Overview - http://hortonworks.com/hadoop/cloudbreak/

Cloudbreak Docs - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-...

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@Bhupendra Mishra

1) Get your cluster up - HDP 2.3.2 (latest version) (Sandbox is good start)

2) Get Zeppelin http://hortonworks.com/hadoop/zeppelin/#section_1

Step 2 will help you to configure spark and access data from Hive tables.

If you don't have the data then stick with this http://hortonworks.com/hadoop-tutorial/interacting...

avatar
Contributor

I want to proceed with distributed cluster. not standalone or sandbox

Full flashed Production grade server

avatar
Master Mentor

@Bhupendra Mishra You are on the right track. http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Installing_HDP_AMB/content/ch_Getting_Rea...

My email is nsabharwal@hortonworks.com

Please feel free to email me and we can discuss

avatar
@Bhupendra Mishra

Depending on your hardware availability for the POC, I would also look at just doing the POC in the Cloud (e.g. MSFT Azure, AWS, GCP). You can leverage Cloudbreak to quickly deploy a fully fledge distributed cluster running Spark, Yarn, the whole nine yards, in the cloud in a matter of minutes.

Here is the documentation on how to do so:

Cloudbreak Overview - http://hortonworks.com/hadoop/cloudbreak/

Cloudbreak Docs - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-...