Created 12-21-2015 05:30 PM
Created 12-21-2015 09:51 PM
Depending on your hardware availability for the POC, I would also look at just doing the POC in the Cloud (e.g. MSFT Azure, AWS, GCP). You can leverage Cloudbreak to quickly deploy a fully fledge distributed cluster running Spark, Yarn, the whole nine yards, in the cloud in a matter of minutes.
Here is the documentation on how to do so:
Cloudbreak Overview - http://hortonworks.com/hadoop/cloudbreak/
Cloudbreak Docs - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-...
Created 12-21-2015 06:17 PM
1) Get your cluster up - HDP 2.3.2 (latest version) (Sandbox is good start)
2) Get Zeppelin http://hortonworks.com/hadoop/zeppelin/#section_1
Step 2 will help you to configure spark and access data from Hive tables.
If you don't have the data then stick with this http://hortonworks.com/hadoop-tutorial/interacting...
Created 12-21-2015 06:53 PM
I want to proceed with distributed cluster. not standalone or sandbox
Full flashed Production grade server
Created 12-21-2015 06:56 PM
@Bhupendra Mishra You are on the right track. http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Installing_HDP_AMB/content/ch_Getting_Rea...
My email is nsabharwal@hortonworks.com
Please feel free to email me and we can discuss
Created 12-21-2015 09:51 PM
Depending on your hardware availability for the POC, I would also look at just doing the POC in the Cloud (e.g. MSFT Azure, AWS, GCP). You can leverage Cloudbreak to quickly deploy a fully fledge distributed cluster running Spark, Yarn, the whole nine yards, in the cloud in a matter of minutes.
Here is the documentation on how to do so:
Cloudbreak Overview - http://hortonworks.com/hadoop/cloudbreak/
Cloudbreak Docs - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-...