Created 02-02-2016 08:30 PM
Created 02-02-2016 11:50 PM
Like Neeraj and Artem pointed out, Apache Atlas is the right tool for managing metadata for Hadoop. Falcon is more for managing the data pipeline and data workflow management which is big part of overall data governance but not metadata.
In addition to the links and resources provided, here is a Apache Atlas presentation video by Governance product manager, Andrew Ahn..
Created 02-02-2016 08:30 PM
@khushi kalra please see this discussion.
Created 02-02-2016 08:30 PM
Look into Apache Atlas
Created 02-02-2016 08:32 PM
Very very good talk https://www.youtube.com/watch?v=uLSPtm2dc0c&ab_channel=Dataguise
Created 02-02-2016 08:35 PM
@khushi kalra In case you want to demo then see this
You can pick sandbox to run this.
FYI: Waterline is one of our partners but I don't this its open source link
Created 02-02-2016 11:50 PM
Like Neeraj and Artem pointed out, Apache Atlas is the right tool for managing metadata for Hadoop. Falcon is more for managing the data pipeline and data workflow management which is big part of overall data governance but not metadata.
In addition to the links and resources provided, here is a Apache Atlas presentation video by Governance product manager, Andrew Ahn..
Created 02-03-2016 02:40 PM
So i should use both Atlas and Falcon?
Created 02-03-2016 06:39 PM
The answer in short is it depends what you are looking for. In Hortonworks platform we have Apache Atlas and Apache Falcon. The 2 tools though under governance has different use case.
For Metadata Management with HDP you should use Apache Atlas. The verison 0.5 is the first release of the product and it gets much slicker with the upcoming release.
Waterline integrates with Atlas. Waterline will give you metadata discovery, but does not completely integrate with HDP. They run a map reduce job, which will allow you to see patterns in data and say what kind of data it is. Now if you have to take that file metadata and use in conjunction with Hive for any policy work, it will be via Atlas.
Atlas is part of the DGI framework. The idea of DGI is to be able to provide an metadata exchange were a community of companies can work in one platform. As Neeraj mentioned, Dataguise is one of them. We have Collibra, Allation and others that are also there.
Now the question, I have for you is what are you trying to achieve? Governance is little bit fuzzy in people's mind.
Look at the presentation here
http://hortonworks.com/partners/learn/#dgi
I hope this helps.