I am planning to appear for the Hadoop Developer Certification (CCD 410) next month.
I was going through the new syllabus and found that few topics are not present in that. So not sure if they are relevant from the Exam Perspective.
I have few queries, if any one can help me answering those:
1. I did not find any mention of PIG in the new syllabus, although in old syllabus it was there. I understand knowledge of PIG is always going to be helpful but just wanted to check, if from this Certification exam perspective it is covered or not. Are there any questions going to be asked related to PIG?
2. I do not see Hbase, Flume and Avro mentioned in the new topics, so does that mean no question can be expected on them?
3. Any suggestion on the study material for the 4th topic i.e. related to Querying.
Thanks a lot in adavnce.
This page http://cloudera.com/content/cloudera/en/training/certification/ccdh/prep.html list Pig at the top along with other ecosytem projects. The reason they are at the top rather than in the specific objectives is that you should know what they are and what they do but there are no code questions that require you to transform data in Pig (for example). A general knowledge is sufficient. Same with your question 2 -- you should know how Hadoop reads and writes files and thus understand file formats (in the case of Avro, Parquet, etc.) as well as Flume, and HBase. The exam is pretty Hive and Java MR heavy and is likely to see an update soon as this form/syllabus is 16 months old and while the questions get updated, the objectives are in the process of getting an update.
On #3: I've said elsewhere on this list that it might be helpful to think of querying along the lines of the following:
* Extract summary data from a structured data set to answer a specific query
* Transform data from one data schema into another to facilitate answering a query
* Perform data cleaning operations to prepare data for querying
* Locate the source data for a Hive table
* Query data from a Hive table using HiveQL
* Join two structured data sets using Hive
Because the exam is limited to the multiple-choice format, you're given the code and asked to analyze it, debug it, etc. so the above is more a suggestion of the kinds of things you should have mastered or at least good experience with.