Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hi I am new to falcon , can anyone help me with the metadata thing?should i use waterline or does falcon has any plug in for that?

avatar
Rising Star
1 ACCEPTED SOLUTION

avatar
@khushi kalra

Like Neeraj and Artem pointed out, Apache Atlas is the right tool for managing metadata for Hadoop. Falcon is more for managing the data pipeline and data workflow management which is big part of overall data governance but not metadata.

In addition to the links and resources provided, here is a Apache Atlas presentation video by Governance product manager, Andrew Ahn..

https://www.youtube.com/watch?v=LZR4qhKJeSI

View solution in original post

7 REPLIES 7

avatar
Master Mentor

@khushi kalra please see this discussion.

avatar
Master Mentor

@khushi kalra

Look into Apache Atlas

avatar
Master Mentor

avatar
Master Mentor

@khushi kalra In case you want to demo then see this

You can pick sandbox to run this.

FYI: Waterline is one of our partners but I don't this its open source link

avatar
@khushi kalra

Like Neeraj and Artem pointed out, Apache Atlas is the right tool for managing metadata for Hadoop. Falcon is more for managing the data pipeline and data workflow management which is big part of overall data governance but not metadata.

In addition to the links and resources provided, here is a Apache Atlas presentation video by Governance product manager, Andrew Ahn..

https://www.youtube.com/watch?v=LZR4qhKJeSI

avatar
Rising Star

So i should use both Atlas and Falcon?

avatar

@khushi kalra

The answer in short is it depends what you are looking for. In Hortonworks platform we have Apache Atlas and Apache Falcon. The 2 tools though under governance has different use case.

For Metadata Management with HDP you should use Apache Atlas. The verison 0.5 is the first release of the product and it gets much slicker with the upcoming release.

Waterline integrates with Atlas. Waterline will give you metadata discovery, but does not completely integrate with HDP. They run a map reduce job, which will allow you to see patterns in data and say what kind of data it is. Now if you have to take that file metadata and use in conjunction with Hive for any policy work, it will be via Atlas.

Atlas is part of the DGI framework. The idea of DGI is to be able to provide an metadata exchange were a community of companies can work in one platform. As Neeraj mentioned, Dataguise is one of them. We have Collibra, Allation and others that are also there.

Now the question, I have for you is what are you trying to achieve? Governance is little bit fuzzy in people's mind.

Look at the presentation here

http://hortonworks.com/partners/learn/#dgi

I hope this helps.