Support Questions
Find answers, ask questions, and share your expertise

Using different SQL On Hadoop engines to query TPC_H and TPC-DS


I would like to familiarize myself more with the HDP platform, Would you please guide me where is the best place of documentation, examples or demos to start.

Mainly I would like to to have APACHE Hive, Cloudera Impala, Spark SQL, and SPARK/Shark in HDP and loading TPC-H and TPC-DS or anyother workloads and start query these datasets using the mentioned SQL on hadoop engines.

I tried one of the samples with uploading csv files and query them using hive, I would like to have more examples with the above.

Many Thanks and appricate the help.





@Mohammed Syam

You can run all the rest on HDP except Impala which as you know is Cloudera product. Hortonworks Data Platform (HDP) provides a Sandbox (VM) with most of the components installed this is a quick easy way to start a deep dive. To avoid frustration you should have to my experience at least 12GB of RAM though 8GB is minimum recommended.

You will choose between VMware, Docker or VirtualBox whichever you find convenient.

The link below will take you through all the steps learning_ropes_of_HDP_sandbox

Hope that helps.


Thanks @Geoffrey Shelton Okot

I already did this, I want to know if there is any documentation of how I can upload TPC-H or TPC-DS data-sets and start querying them.

Rising Star

Why don't you refer following link ?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.