Created 05-31-2016 03:26 PM
When running a pig script or hive query in the HDPCD exam can we use tez rather than mapreduce as the execution engine? Given that the exam is hosted on a single node cluser on a rather weak machine and there is a 2 hour time limit to complete 7-10 task this could save time in the execution of scripts/queries especially considering the results of the exam are exclusively based on the output of our scripts.
Created 05-31-2016 04:39 PM
Yes - you can run the scripts however you like. Keep in mind there may be a task that requires you to use Tez. But if nothing is mentioned specifically in the task instructions, then you can run your Pig and Hive scripts using Tez or not. I would recommend using Tez though for every task when applicable. Like you said - why waste precious exam time.
I will take offense to the "weak machine" comment. The instances we use are extremely large for the small amount of processing that happens on them - c3.4xlarge EC2 instances. The datasets on the exam are purposely small so that time is not wasted in processing a lot of data. The longest queries you run will take less than 90 seconds, and that is w/out using Tez.
Created 05-31-2016 04:39 PM
Yes - you can run the scripts however you like. Keep in mind there may be a task that requires you to use Tez. But if nothing is mentioned specifically in the task instructions, then you can run your Pig and Hive scripts using Tez or not. I would recommend using Tez though for every task when applicable. Like you said - why waste precious exam time.
I will take offense to the "weak machine" comment. The instances we use are extremely large for the small amount of processing that happens on them - c3.4xlarge EC2 instances. The datasets on the exam are purposely small so that time is not wasted in processing a lot of data. The longest queries you run will take less than 90 seconds, and that is w/out using Tez.