We are in an decision point to select the right approach to transform the data. Want to have your input ?
Our case: We use Hive as main data lake store and all the data (so far) are structured data. Same as traditional data warehouse, we need to do transformation (lookup, aggregation, etc) on source tables to target tables. Now, need to decide which approach to go. Now I tend to go with coding approach (HiveQL, Spark) and build our own metadata. But the tools like Talend was also recommended by others. So want to hear some ideas here.
Once driver behind the decision is that I want to build a team of high tech skills. I do have traditional ETL background (informatica, datastages, etc) and see the pros and cons. So don't want settle with " a large team of low-skill programmers supporting a single tool" and believe "today's big data developers are a bit more technical than their data warehousing counterparts. And so, they are even less enamored by clunky frameworks, less intimidated of writing a lot of code if necessary".
Your thought ?