Me: Standing next to the whiteboard, Yes and that's why we use the term "Enterprise Ready Data Lake"
Imagine that there are 3 points
Point 1 -> You need to prove your identity to get access to Lake and then need permissions or authority to access data.
Point 2 -> Once you proved your authenticity then demands comes to manage the lifecycle of data from it's requirement to retirement "Automated process"
Point 3 -> Life Cycle Management process needs to be integrated with a Governance solution to manage data of data "metadata" , data lineage, auditing and more to fullfil security and compliance requirement.
Point 1 --> Entry Point: You must have strong Authentication in place to get into the system and more users will be coming in to access data as we move away from silos of data to a centralized repository. The access management must be easier to manage i,e Security solution should have a centralized place toAdmin (create, define and manage) security policies. Once users gets in and has access then we need to track their actions and that's Auditing. At last, Data Encryption in motion & at rest
Point 2 --> Security is place and now we know that Data ingestion is occurring with full security. Now, business wants to manage the lifecycle of data in one common place "Data replication, retention, handling late data arrival rules, data mirroring and visualize the complete data pipeline"
Point 3 --> Once data lifecycle management in place then we will be generating more data of data "metadata" and there is existing legacy metadata that need to be exchange with Hadoop system. This generates the requirement of Data Governance solution. This solution should provide complete data lineage, exchange, search functionality
Customer: Yes, this is exactly what we are looking for. All this must be well integrated and please provide this as 100% open source but enterprise ready solution.