10-17-2015 02:24 AM
I registered a couple of days back for CCP: Data engineer exam and I want to know how to simulate the exam environment ( 7 node high performance cluster)
as described in the exam description and what is the best way to practice the real time examples because I know cloudera doesn't offer practice test anymore
Can you please guide me on how to setup multi-node cluster or where I will find the instructions and the system requirements I need for that. Your tips and reply
will be very much appreciated. The VM ware cloudera offers is a single node environment and I am looking for multi-node ( 7 node basically)
to work in that environment
Thanks in advance.
10-17-2015 05:46 AM
I dont think you are going to find detailed instructions on how to set up a 7-node cluster unless you already know how to do this from prior experience.
So your best bet at this time I think is to make a 3-4 node cluster using VMs or Amazon EC2 instances with the Cloudera software and other tools running on them.
Then you can come up with projects yourself to practice.
The other alternative is to find those Cloudera Training events for Data Engineers and sign up to get started.
I am currently practicing with an Ubuntu 64-bit 4-node cluster setup with Virtual Box on two computers in my home with sample projects I made up myself based on expectations for the test.
I have tried to install almost all the tools called out on the page below
This took me about 3 weeks to figure out but I am only able to do this because I have prior system administration experience.
Without this, you are pretty much out of luck.
Once you set up the tools, try to practice solving problems that are called out here
The Data Ingestion portion talks about Flume, HDFS console commands and Sqoop.
The Transform, Stage, Store sections covers pig, Hive, Map/Reduce and Spark skills.
For the Data Analysis part, you need to know how to create tables in Hive that uses SerDe and other custom settings.
The Workflow portion covers skills you need from Apache Oozie.
I discuss this in more details later.
02-09-2016 09:25 AM
woww..You have been an admin and then it took you 3 weeks to set it up.
For the newbies, it will take forever.
Cloudera should provide a multi-node simulated environment for free for developers who want to write this test who had no admin experience.
02-09-2016 09:42 AM
We have a free environment, it is called Cloudera Quickstart VM.
Review each of the Required Skills for the exam: http://www.cloudera.com/training/certification/ccp-data-engineer.html This is a developer exam. Which of those skills change by having more than one node in your cluster? Zero.
So there is no reason to set up a seven node cluster unless you want to do it for fun. It does not help you practice in any way.
02-10-2016 07:22 AM
Out of curiosity,
@juddimal : How many certified CCP Data Engineers(DE575) till date.?
Give me few names, so that I can connect to them on linkedin for guidance/experience.
fyi, I have been searching internet for people who has done it.. no luck till now.