Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

is it possible to expose dataset sample data(exp first 10 rows) using atlas?

Solved Go to solution
Highlighted

is it possible to expose dataset sample data(exp first 10 rows) using atlas?

New Contributor

Hello,

i am new to atlas and i am wondering if there is a solution making atlas expose, in addition to a dataset metadata, a sample data of this dataset ( 10 first rows for example).

for a purpose of data governance, the problem i am solving is that the metadata could not be so comprehensive functionally, and reading a sample data will make more sens to explain a dataset content.

if , actually, there is not a solution please help me find were should i invest my effort ( inner development within atlas, or treat the problem using a third party accessing hive,scoop or hbase directly to get the sample data)

thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: is it possible to expose dataset sample data(exp first 10 rows) using atlas?

Rising Star

There are two ways I'd suggest off the top of my head.

  1. The simplest might be to add a metadata tag that would contain a small sample in CSV, JSON or HTML format, and populate it via the API or Atlas Kafka topics. For example, you could use HDF NiFi to periodically sample each table in the Hive, format the data, and populate the attribute.
  2. For a more integrated approach, you might consider using the DataPlane DSS data profiler framework to add a "sample" profile that could be stored alongside the other profiler metadata and surfaced in DSS.
3 REPLIES 3

Re: is it possible to expose dataset sample data(exp first 10 rows) using atlas?

Rising Star

There are two ways I'd suggest off the top of my head.

  1. The simplest might be to add a metadata tag that would contain a small sample in CSV, JSON or HTML format, and populate it via the API or Atlas Kafka topics. For example, you could use HDF NiFi to periodically sample each table in the Hive, format the data, and populate the attribute.
  2. For a more integrated approach, you might consider using the DataPlane DSS data profiler framework to add a "sample" profile that could be stored alongside the other profiler metadata and surfaced in DSS.

Re: is it possible to expose dataset sample data(exp first 10 rows) using atlas?

New Contributor

Thank you for your reply, i think i would go for the first option making a metadata tag for each dataset.

Re: is it possible to expose dataset sample data(exp first 10 rows) using atlas?

Rising Star
Don't have an account?
Coming from Hortonworks? Activate your account here