Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Best way to update data in MDM on hadoop

avatar
Contributor

What is the east way to update data in MDM and hadoop.

Is hive 0.14 with update support or updating data frame via spark is the recommended options.

1 ACCEPTED SOLUTION

avatar

It is no wonder this one has set unanswered for a few days as it really is a big "it depends" question. That said, please check out my "Mutable Data in Hive's Immutable World" talk at the 2015 Hadoop Summit conference. The video is at https://www.youtube.com/watch?v=EUz6Pu1lBHQ and you can retrieve the presentation deck at http://www.slideshare.net/lestermartin/mutable-data-in-hives-immutable-world.

Again, this is a BIG topic and this presentation talks through some of the classically simple & novel solutions that have worked for many for a while. It would be good to compare & contrast the thoughts presented in this talk with those presented by others as your end solution might be completely different that the approaches I discuss. GOOD LUCK!!

View solution in original post

3 REPLIES 3

avatar

It is no wonder this one has set unanswered for a few days as it really is a big "it depends" question. That said, please check out my "Mutable Data in Hive's Immutable World" talk at the 2015 Hadoop Summit conference. The video is at https://www.youtube.com/watch?v=EUz6Pu1lBHQ and you can retrieve the presentation deck at http://www.slideshare.net/lestermartin/mutable-data-in-hives-immutable-world.

Again, this is a BIG topic and this presentation talks through some of the classically simple & novel solutions that have worked for many for a while. It would be good to compare & contrast the thoughts presented in this talk with those presented by others as your end solution might be completely different that the approaches I discuss. GOOD LUCK!!

avatar
Contributor

Hi Lester,

Thanks for sharing your presentation. Our need is little more stringent where an SLA of maintain the mainframe system and hadoop in sync would be a challenge.

I am thinking to using a CDC product for going after the DB2 logs on mainframe and then generate a message to update HBase.

What do you think?

Thanks,

Roy

avatar

Yep, that could work. Putting it in HBase could also allow you to maintain some version of the record, too. Good luck and feel free to share more.