Support Questions

aymen_zaiter · ‎01-07-2019

Hello,

i am new to atlas and i am wondering if there is a solution making atlas expose, in addition to a dataset metadata, a sample data of this dataset ( 10 first rows for example).

for a purpose of data governance, the problem i am solving is that the metadata could not be so comprehensive functionally, and reading a sample data will make more sens to explain a dataset content.

if , actually, there is not a solution please help me find were should i invest my effort ( inner development within atlas, or treat the problem using a third party accessing hive,scoop or hbase directly to get the sample data)

thanks in advance.

wcbdata · ‎01-09-2019

There are two ways I'd suggest off the top of my head.

The simplest might be to add a metadata tag that would contain a small sample in CSV, JSON or HTML format, and populate it via the API or Atlas Kafka topics. For example, you could use HDF NiFi to periodically sample each table in the Hive, format the data, and populate the attribute.
For a more integrated approach, you might consider using the DataPlane DSS data profiler framework to add a "sample" profile that could be stored alongside the other profiler metadata and surfaced in DSS.

View solution in original post

wcbdata · ‎01-09-2019

There are two ways I'd suggest off the top of my head.

The simplest might be to add a metadata tag that would contain a small sample in CSV, JSON or HTML format, and populate it via the API or Atlas Kafka topics. For example, you could use HDF NiFi to periodically sample each table in the Hive, format the data, and populate the attribute.
For a more integrated approach, you might consider using the DataPlane DSS data profiler framework to add a "sample" profile that could be stored alongside the other profiler metadata and surfaced in DSS.

aymen_zaiter · ‎01-16-2019

Thank you for your reply, i think i would go for the first option making a metadata tag for each dataset.

wcbdata · ‎01-16-2019

Glad that helped! My colleague Greg Keys has a great series here on HCC about extending Atlas that may help you out, too:

https://community.hortonworks.com/articles/229421/customizing-atlas-part1-model-governance-traceabil...

https://community.hortonworks.com/articles/231988/customizing-atlas-part2-deep-source-metadata-embed...

Cloudera Community

Support Questions

is it possible to expose dataset sample data(exp first 10 rows) using atlas?

HDP3 to CDP - Atlas backup and restore using Atlas...

Is it possible to expose a Rest service using Nifi...

Manually creating entities using Atlas UI

Kafka Producer sample code in Scala and Python

Delete Row Key(s) using DeleteHBaseRow processor i...

Hive Row Level Access Restriction Using Ranger

Row vs Columnar Storage For Hive

Interacting with Apache Atlas APIs using CDP-Publi...

Mirroring Datasets Between Hadoop Clusters with Ap...

Tag Hive data using Apache Atlas