Created on 06-29-2017 02:21 PM - edited 09-16-2022 04:51 AM
Hi,
I am a BI Developer/Consultant, in particular, OBI EE. I know Oracle PL/SQL and have some exposure to Oracle Data Integrator. I have recently started knowing Hortonworks' Hadoop Big Data platform.
I am unfamiliar with Java, Python or R. At this (st)age I am not sure, I can learn these programming languages. I understand that Hadoop consists of several technologies. Of course, I can consider Hadoop Architectural career path.
What I would like to know now is, is there any specific component in the Hortonworks' Hadoop ecosystem that can complement to my traditional DW/BI skills? I would like to concentrate on those now.
If you have any questions, please let me know.
Thank you.
Regards,
Manoj.
Created 06-29-2017 05:22 PM
Hi @Manoj Dixit
The natural progression with your background would be to look at technologies such as HDFS and Hive, and the associated governance and security tools like Atlas and Ranger. From there, you can branch out to NoSQL solutions such as HBase and then look at streaming technologies like Hortonworks Data Flow (Apache NiFi).
Created 06-29-2017 05:22 PM
Hi @Manoj Dixit
The natural progression with your background would be to look at technologies such as HDFS and Hive, and the associated governance and security tools like Atlas and Ranger. From there, you can branch out to NoSQL solutions such as HBase and then look at streaming technologies like Hortonworks Data Flow (Apache NiFi).
Created 06-29-2017 05:33 PM
Adding to Sonu's response:
Moving to Hadoop from BI/EDW background is certainly a very common path. Those coming from that background usually find themselves more comfortable with Hive as an entry point. Hive provides an abstraction layer on top of MapReduce/Tez, and is based on SQL-like syntax that is ANSI compliant. It also has the advantage of providing a JDBC/ODBC connector, so most of the industry BI tools such as Tableau, Qlik, Microstrategy, etc.. can integrate and interact with Hive. This means that business analysts may continue to use the tools they are already familiar with while leveraging the power of Hadoop in the background
I would recommend you start by looking at Hive. Once comfortable with it, you can start to explore Hive data modelling and optimization, and then branch out to the other areas that Sonu recommended. I've also seen people in the field focus their entire career/job around just Hive.
Take a look at the link below for an introduction to Hive. There's plenty of internet resources and books that you can leverage to advance your knowledge. Hortonworks also provides Developer Training that covers an introduction to Hive as well as other engines/tools.
https://hortonworks.com/tutorial/how-to-process-data-with-apache-hive/
Created 06-29-2017 10:39 PM
Thank you very much @Sonu Sahi and @Eyad Garelnabi. I shall follow the views expressed above.
Regards,
Manoj.