I registered a couple of days back for CCP: Data engineer exam and I want to know how to simulate the exam environment ( 7 node high performance cluster)
as described in the exam description and what is the best way to practice the real time examples because I know cloudera doesn't offer practice test anymore
Can you please guide me on how to setup multi-node cluster or where I will find the instructions and the system requirements I need for that. Your tips and reply
will be very much appreciated. The VM ware cloudera offers is a single node environment and I am looking for multi-node ( 7 node basically)
to work in that environment
Thanks in advance.
I dont think you are going to find detailed instructions on how to set up a 7-node cluster unless you already know how to do this from prior experience.
So your best bet at this time I think is to make a 3-4 node cluster using VMs or Amazon EC2 instances with the Cloudera software and other tools running on them.
Then you can come up with projects yourself to practice.
The other alternative is to find those Cloudera Training events for Data Engineers and sign up to get started.
I am currently practicing with an Ubuntu 64-bit 4-node cluster setup with Virtual Box on two computers in my home with sample projects I made up myself based on expectations for the test.
I have tried to install almost all the tools called out on the page below
This took me about 3 weeks to figure out but I am only able to do this because I have prior system administration experience.
Without this, you are pretty much out of luck.
Once you set up the tools, try to practice solving problems that are called out here
The Data Ingestion portion talks about Flume, HDFS console commands and Sqoop.
The Transform, Stage, Store sections covers pig, Hive, Map/Reduce and Spark skills.
For the Data Analysis part, you need to know how to create tables in Hive that uses SerDe and other custom settings.
The Workflow portion covers skills you need from Apache Oozie.
I discuss this in more details later.
woww..You have been an admin and then it took you 3 weeks to set it up.
For the newbies, it will take forever.
Cloudera should provide a multi-node simulated environment for free for developers who want to write this test who had no admin experience.
We have a free environment, it is called Cloudera Quickstart VM.
Review each of the Required Skills for the exam: http://www.cloudera.com/training/certification/ccp-data-engineer.html This is a developer exam. Which of those skills change by having more than one node in your cluster? Zero.
So there is no reason to set up a seven node cluster unless you want to do it for fun. It does not help you practice in any way.
That I have already installed and currently setting it up CDH 5.5.
Its good to know I can fully prepare with a single node cluster.
Out of curiosity,
@juddimal : How many certified CCP Data Engineers(DE575) till date.?
Give me few names, so that I can connect to them on linkedin for guidance/experience.
fyi, I have been searching internet for people who has done it.. no luck till now.
Could you please explain on what set of technologies that I must be mastered to pass the exam?
for example, I know Pig and Hive very well. Is that sufficient to pass the exam? or Do you think that I should learn Spark also?
I understand that choice of tool set (for a given problem) is left to the me only. However, will there be any problems that can only be done by SPARK? Please advise, such that I will learn spark as well.