I would like to familiarize myself more with the HDP platform, Would you please guide me where is the best place of documentation, examples or demos to start.
Mainly I would like to to have APACHE Hive, Cloudera Impala, Spark SQL, and SPARK/Shark in HDP and loading TPC-H and TPC-DS or anyother workloads and start query these datasets using the mentioned SQL on hadoop engines.
I tried one of the samples with uploading csv files and query them using hive, I would like to have more examples with the above.
Many Thanks and appricate the help.
You can run all the rest on HDP except Impala which as you know is Cloudera product. Hortonworks Data Platform (HDP) provides a Sandbox (VM) with most of the components installed this is a quick easy way to start a deep dive. To avoid frustration you should have to my experience at least 12GB of RAM though 8GB is minimum recommended.
You will choose between VMware, Docker or VirtualBox whichever you find convenient.
The link below will take you through all the steps learning_ropes_of_HDP_sandbox
Hope that helps.