I'm Prashant, having 4 years of experience in shell scripting, ETL tool, db2 database. I am working in big data from past 6 months on hive, impala, HUE.
I want to get certified in CCA Spark and Hadoop Developer Exam (CCA175).
I have a hadoop cluster installed in my office with cloudera(hive, impala, pig, sqoop, spark, oozie etc.)
Don't have knowledge/experience on Java/MR, really confused how to start preparing for this certification.
Can someone please give me some direction.
One of the best way to prepare for CCA175 certification is undergoing 4 days - Cloudera Developer training offered by Cloudera and its traning partners.
This is a very comprehensive training program and probably prerequisite for taking up CCA175 certification.
In addtion to this training, you require either at least 1 month of hands on training (depending upon your CDH expertise) or atleast one end-to-end Big Data implemeation experience.
Your prepration should be more focused towards HDFS, Spark, Sqoop, Impala and Hive.
And lastly, this 120 mins Cloudera certification are more focused towards hands on experience than theory.
Therefore, more your practice, better your changes are for clearing this certification.
Hope this is helpful for you.
Good luck for your prepration and certification test.
I have one more query. Since i am from Data warehousing background with very less knowledge of Java, so i am finding it difficult to write map-reduce jobs in java.
Do they check java programming for MR jobs in this exam ?
For spark - Can you please suggest any online MOOC training/book to start it ?
Cloudera Developer training are out of budget for me.
While it's always recommended to have a good knowledge of Java programming because MR low level programming will become easy for you.
You can code MR in other programming languages such as Python, however, this will add an extra layer of interpreter which is not very helpful for performance.
On CCA175 certification front, your Java programming expertise is not evaluated. You are provided selection option for tool sets based on which you can answer the certification questions.
Hope this helps.
I am working as an Hadoop Administrator for the past 2.5 years, Basically i am a Teradata Administrator. one of our important client request for BigData solution for processing more than 100 Million CSV records, so i entered into Big data world (REALLY BIG WORLD) and I am still learning like student :)
Once we are entered into BIG data world we can't able to skip the Hadoop. so i am working for past 2.5 years in hadoop tools (Hue, Hive, HBase, PIG, Map-Reduce, Sqoop,kafka,Storm etc...).
Initially i am really struggled to survive in hadoop environment (New tools including JAVA) but now i am really happy to here (community) i am learning as well as waiting for the big future too.
I am very passionate to play cricket.
To introduce myself:
have over 18 years of experience In Information Architecture, Data/Data warehouse Architect (OLTP and OLAP), Data modeling and governance, Business Intelligence design, Database Design, Database development and performance tuning, Operations & Infrastructure management, Technical Project Management
and executing POC for performance benchmarking in multi tier and multi Terabyte Data warehouse.
Experience on establishing Direction and data governance of major change program in their use of data.
I am really excited to be part of Clodera community and is privillege to interact with so many talented proessionals here. i am quite new to this group and can't wait to take deep dive into Big Data and Hadoop world and am sure to have lot of fun in future.
I love travlling, Walking, playing cricket and have passion for the community service and am associated with Non profit organization
for more than 7 years.
You can ind me in linkedIn: http://www.linkedin.com/pub/deepak-lal/9/696/720
My name is Brant. I'm a researcher at Johns Hopkins. My background is in biomedical research with an emphasis on machine learning and natural language processing. I'm coming over from mostly large shared memory and MPI machines/clusters. I do have lots of Java programming experience but most of our applications aren't Java based. I've worked on Hadoop/Accumulo based applications before but am new to the dev ops/configuration/deployment aspects.
I'm currently trying to get my dockerized applications deployed in hadoop streaming on one of our clusters and eventually hoping to get longer running dockerized gpu applications running in slider. We're using docker because we can encapsulate our c++/python/etc depedencies with our programs.
I am mohan ,base location is Singapore and working with Singtel pvt limitted(Telecom company),recently joined this company as Big data architecture .Our company using cloudera 5.4 entrerprise edition .
Having 10+years experience and cloud era distribution working more than 2 years in sizing ,storage format,data analytics .