- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
is it possible to expose dataset sample data(exp first 10 rows) using atlas?
- Labels:
-
Apache Atlas
Created ‎01-07-2019 12:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
i am new to atlas and i am wondering if there is a solution making atlas expose, in addition to a dataset metadata, a sample data of this dataset ( 10 first rows for example).
for a purpose of data governance, the problem i am solving is that the metadata could not be so comprehensive functionally, and reading a sample data will make more sens to explain a dataset content.
if , actually, there is not a solution please help me find were should i invest my effort ( inner development within atlas, or treat the problem using a third party accessing hive,scoop or hbase directly to get the sample data)
thanks in advance.
Created ‎01-09-2019 03:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are two ways I'd suggest off the top of my head.
- The simplest might be to add a metadata tag that would contain a small sample in CSV, JSON or HTML format, and populate it via the API or Atlas Kafka topics. For example, you could use HDF NiFi to periodically sample each table in the Hive, format the data, and populate the attribute.
- For a more integrated approach, you might consider using the DataPlane DSS data profiler framework to add a "sample" profile that could be stored alongside the other profiler metadata and surfaced in DSS.
Created ‎01-09-2019 03:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are two ways I'd suggest off the top of my head.
- The simplest might be to add a metadata tag that would contain a small sample in CSV, JSON or HTML format, and populate it via the API or Atlas Kafka topics. For example, you could use HDF NiFi to periodically sample each table in the Hive, format the data, and populate the attribute.
- For a more integrated approach, you might consider using the DataPlane DSS data profiler framework to add a "sample" profile that could be stored alongside the other profiler metadata and surfaced in DSS.
Created ‎01-16-2019 03:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your reply, i think i would go for the first option making a metadata tag for each dataset.
Created ‎01-16-2019 04:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Glad that helped! My colleague Greg Keys has a great series here on HCC about extending Atlas that may help you out, too:
