Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Best way to update data in MDM on hadoop

Solved Go to solution

Best way to update data in MDM on hadoop

New Contributor

What is the east way to update data in MDM and hadoop.

Is hive 0.14 with update support or updating data frame via spark is the recommended options.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Best way to update data in MDM on hadoop

It is no wonder this one has set unanswered for a few days as it really is a big "it depends" question. That said, please check out my "Mutable Data in Hive's Immutable World" talk at the 2015 Hadoop Summit conference. The video is at https://www.youtube.com/watch?v=EUz6Pu1lBHQ and you can retrieve the presentation deck at http://www.slideshare.net/lestermartin/mutable-data-in-hives-immutable-world.

Again, this is a BIG topic and this presentation talks through some of the classically simple & novel solutions that have worked for many for a while. It would be good to compare & contrast the thoughts presented in this talk with those presented by others as your end solution might be completely different that the approaches I discuss. GOOD LUCK!!

3 REPLIES 3

Re: Best way to update data in MDM on hadoop

It is no wonder this one has set unanswered for a few days as it really is a big "it depends" question. That said, please check out my "Mutable Data in Hive's Immutable World" talk at the 2015 Hadoop Summit conference. The video is at https://www.youtube.com/watch?v=EUz6Pu1lBHQ and you can retrieve the presentation deck at http://www.slideshare.net/lestermartin/mutable-data-in-hives-immutable-world.

Again, this is a BIG topic and this presentation talks through some of the classically simple & novel solutions that have worked for many for a while. It would be good to compare & contrast the thoughts presented in this talk with those presented by others as your end solution might be completely different that the approaches I discuss. GOOD LUCK!!

Re: Best way to update data in MDM on hadoop

New Contributor

Hi Lester,

Thanks for sharing your presentation. Our need is little more stringent where an SLA of maintain the mainframe system and hadoop in sync would be a challenge.

I am thinking to using a CDC product for going after the DB2 logs on mainframe and then generate a message to update HBase.

What do you think?

Thanks,

Roy

Re: Best way to update data in MDM on hadoop

Yep, that could work. Putting it in HBase could also allow you to maintain some version of the record, too. Good luck and feel free to share more.