Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Which area of "Big Data" is in most demand

avatar
New Contributor

I am an experienced Microsoft SQL Server (don't laugh) DBA turned developer/engineer in the Banking industry. I'm currently involved with automation and orchestration in large environments and spend a lot of time writing perl/powershell scripts, SQL. As I grow older I am getting more interested in the architecture side of things rather than the gritty detail.

 

What career paths are open to someone with my experience and what training would you suggest? As I am freelance, I would like to keep to free training if possible at this time. I've downloaded the QuickStart VM but not sure where to look next.

 

Thanks for any insights.

1 ACCEPTED SOLUTION

avatar
Master Collaborator

If by demand you mean in the jobs market, I can only give highlights from a much larger set of regular and ongoing research into the jobs market and how that might map to our (Cloudera's) certifications which are much more narrowly focused than our training offerings. 

 

If you look at this year's labor market (job listings across all job boards, etc.):

- Apache Hadoop fars outranks all other terms (very general term) and is the 67th most referenced term in the market, which includes words/phrases like "Outlook" and "Word" and other products that nearly EVERY job listings includes.

 

Of those Apache Hadoop jobs

- 2000 currently mention Cloudera specifically (as in "must have experience with Cloudera" or "must be Cloudera certified")

- Java far outranks all other requirements (mentioned in 60% of the job listings)

- SQL is mentioned next (35%), though many of these also include Java or NoSQL or "at least one other language" usually Python

- Python is next (34%)

- Then it goes to Linux, Unix, NoSQL experience, Hive, Pig (though presumably the SQL would bolster the HIve/Pig)

 

Of those that use Hadoop in the requirements, the job titles rank out.

- Sofware Engineer

- Java Software Engineer

- Data Architect

- Data Scientist

- Solutions Architect

- Data Analyst

 

Of the 41,530 job listings in the US that include the term "Big Data" (expanding it out of Hadoop specific) the job titles themselves rank like this)

- Software Engineer ranks first (3x more than anything else)

- Data Architect

- Java Software Engineer

- Software Development Engineer

- Solutions Architect

- Data Analyst

- Java Engineer

- Data Scientist 

 

So you can see where some of those job titles are most prevelant. 

 

Anyway, there's some raw data. I could spend 1000 hours interpreting it and cross-referencing what it means but there's a sense of things.

 

View solution in original post

3 REPLIES 3

avatar
New Contributor

It's very difficult to say, but considering your position I'd go for Hive. This is the closest tool to what you already know. And it seems that many people go this way because of their background in SQL. I'd discourage you to try hadoop in its Java form if you're not a serious developper. Yet, architecture is very accessible and interesting, and would fit your interest. I advise you a book : Hadoop Operation (http://www.amazon.com/Hadoop-Operations-Eric-Sammer/dp/1449327052/ref=sr_1_1?ie=UTF8&qid=1406273054&...

I should not say that here, but Hortonworks provides certification in Hive and Pig (no offense Cloudera !) : http://hortonworks.com/training/hadoop-2-0-developer-certification/ . If you want to reach the highest level go for http://cloudera.com/content/cloudera/en/training/certification/ccp-ds.html

Even If you don't take the exam, consider its items as mandatory for a good learning path.

Have a nice day.

avatar
Master Collaborator

If by demand you mean in the jobs market, I can only give highlights from a much larger set of regular and ongoing research into the jobs market and how that might map to our (Cloudera's) certifications which are much more narrowly focused than our training offerings. 

 

If you look at this year's labor market (job listings across all job boards, etc.):

- Apache Hadoop fars outranks all other terms (very general term) and is the 67th most referenced term in the market, which includes words/phrases like "Outlook" and "Word" and other products that nearly EVERY job listings includes.

 

Of those Apache Hadoop jobs

- 2000 currently mention Cloudera specifically (as in "must have experience with Cloudera" or "must be Cloudera certified")

- Java far outranks all other requirements (mentioned in 60% of the job listings)

- SQL is mentioned next (35%), though many of these also include Java or NoSQL or "at least one other language" usually Python

- Python is next (34%)

- Then it goes to Linux, Unix, NoSQL experience, Hive, Pig (though presumably the SQL would bolster the HIve/Pig)

 

Of those that use Hadoop in the requirements, the job titles rank out.

- Sofware Engineer

- Java Software Engineer

- Data Architect

- Data Scientist

- Solutions Architect

- Data Analyst

 

Of the 41,530 job listings in the US that include the term "Big Data" (expanding it out of Hadoop specific) the job titles themselves rank like this)

- Software Engineer ranks first (3x more than anything else)

- Data Architect

- Java Software Engineer

- Software Development Engineer

- Solutions Architect

- Data Analyst

- Java Engineer

- Data Scientist 

 

So you can see where some of those job titles are most prevelant. 

 

Anyway, there's some raw data. I could spend 1000 hours interpreting it and cross-referencing what it means but there's a sense of things.

 

avatar
Master Collaborator

Sorry, this cut me off:

 

Given the "Data Architect" demand and given your interest in architecting solutions rather than writing Java, Cloudera has a number of other resources to get you started:

 

Gwen Shapira, a Cloudera Solutions Architect, has written a primer for Oracle DBAs which might be of interest:

http://blog.cloudera.com/blog/2014/01/the-hadoop-faq-for-oracle-dbas/

 

Gwen, along with three other Clouderans are close to releasing a book titled: Hadoop Application Architectures, the early release is available

http://blog.cloudera.com/blog/2014/07/the-new-hadoop-application-architectures-book-is-here/

 

You might want to check those out as well.