Support Questions

Find answers, ask questions, and share your expertise

can we store additional Apache atlas metadata in HBASE?

avatar
Super Collaborator

lineage.pngHi All,

I have downloaded HDP 2.5 sandbox,we know that this sandbox has latest Apache altas version and it's already configured with HBASE for metadata storage of all entities(such as columns,tables and metadata about lineage).

Assume I am already connected to HBASE shell.

Hive tables:

Input: patient_info_raw

output : patient_validated_dataset

First of all my first question is :

1) can we add some additional metadata information for any entity in HBASE storage?

2) How to see Atlas metadata information from HBASE?

Problem Statement:

Consider there are two hive tables named as "patient_info_raw" and "patient_validated_dataset" as show in attached diagram.Out of this tables one table is been already created(i.e. patient_info_raw) and it's metadata is also present in HBASE,so my requirement it that i want to link (or show lineage in atlas UI) this table to "patient_validated_dataset" table just by inserting metadata in HBASE storage.Here, I do not want to execute any hive query(such as CREATE TABLE AS SELECT....,CREATE TABLE <tablename>) on input table(i.e. patient_info_raw).

Lineage must be reflected in atlas UI just by inserting lineage metadata in HBASE tables to create link between these two tables.

We have two tables in HBASE

1) ATLAS_ENTITY_AUDIT_EVENTS

2) atlas_titan

can we do above task in Apache Atlas if yes, then what are the steps to complete it?

Please keep in mind that i am not going to create the output table by executing hive query,Atlas should show metadata information,lineage of output table in atlas UI just by metadata insertion?

1 ACCEPTED SOLUTION

avatar

Hi @Manoj Dhake

I've never tried to implement your use case by this should be possible using Atlas API. I do not recommend altering data directly in HBase.

You can follow these steps:

  1. Create a Hive Table entity for "patient_info_raw" if it doesn't exist in Atlas. Use the REST API call "POST http://<Atlas_Server:Atlas_Port>/api/atlas/entities" where the body is the table EntityDefinition structures. More information on Rest API can be found in this guide.
  2. Create a Hive Table entity for "patient_validated_dataset" if it doesn't exist in Atlas. Use the same method as above
  3. Create a Lineage: to do this, you need two DataSets entities and a Process instance. You already have two DataSets entities (your Hive tables) as Hive Table is a subtype of DataSet. For Process instance, you can use and existing process type or create your own. When you create your process instance, you will set Input and Output as your Hive Tables GUID. This models the lineage and the link between them.

I hope this will help you implement your use case. My advice is to read the Atlas Rest API before implementing this https://atlas.incubator.apache.org/AtlasTechnicalUserGuide.pdf

Abdelkrim

View solution in original post

8 REPLIES 8

avatar

Hi @Manoj Dhake

I've never tried to implement your use case by this should be possible using Atlas API. I do not recommend altering data directly in HBase.

You can follow these steps:

  1. Create a Hive Table entity for "patient_info_raw" if it doesn't exist in Atlas. Use the REST API call "POST http://<Atlas_Server:Atlas_Port>/api/atlas/entities" where the body is the table EntityDefinition structures. More information on Rest API can be found in this guide.
  2. Create a Hive Table entity for "patient_validated_dataset" if it doesn't exist in Atlas. Use the same method as above
  3. Create a Lineage: to do this, you need two DataSets entities and a Process instance. You already have two DataSets entities (your Hive tables) as Hive Table is a subtype of DataSet. For Process instance, you can use and existing process type or create your own. When you create your process instance, you will set Input and Output as your Hive Tables GUID. This models the lineage and the link between them.

I hope this will help you implement your use case. My advice is to read the Atlas Rest API before implementing this https://atlas.incubator.apache.org/AtlasTechnicalUserGuide.pdf

Abdelkrim

avatar
Super Collaborator

Hi Abdelkarim,

Thanks you for reply.

According to you i should use altas api to create entities and link bdtween two tabels,yes we can do like this way.

But can we directely interact with hbase database and store metadata ?

Do you have any document on how to store in hbase and what are the other dependancies are required while storing data?

avatar

@Manoj Dhake

As with any tools, modifying the database directly is dangerous and can lead to inconsistency. For instance, some operations needs to create/modify several data. If you modify data directly, you can miss one step on the road. Also, Atlas uses an index store (Solr) in addition of the metadata store (HBase). This index should be up to date and contains the last information which you can not guarantee when accessing the database.

Atlas comes with integration points that have been developed especially to let you enrich your data governance and customize your management. These integrations points are the secure path to implement your logic:

  1. Rest API as described before
  2. Messaging integration through Kafka (look to the same documentation that I previously provided)

avatar
Super Collaborator

Could you please post the rest api example to create hive table entity and set lineage link between two tables?

avatar

I don't have a ready example for your use case. Look to the documentation I gave you, you can find an example with HBase. You just need to adapt it to your needs. Hope this helps.

avatar
Super Collaborator

Hi Abdelkrim,

I am trying to create altas hive table entity in altas as per the REST api document but facing below issue

Error:

{"error":"Unable to deserialize json","stackTrace":"java.lang.IllegalArgumentException: Unable to deserialize json\n\tat org.apache.atlas.services.DefaultMetadataService.deserializeClassInstances(DefaultMetadataService.java:313)\n\tat org.apache.atlas.services.DefaultMetadataService.createEntities(DefaultMetadataService.java:278)\n\tat org.apache.atlas.web.resources.EntityResource.submit(EntityResource.java:114)\n\tat sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:497)\n\tat com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)\n\tat com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)\n\tat com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)\n\tat com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)\n\tat com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)\n\tat com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)\n\tat com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)\n\tat com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)\n\tat com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)\n\tat com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)\n\tat com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)\n\tat com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)\n\tat com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)\n\tat com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)\n\tat org.apache.atlas.web.filters.AuditFilter.doFilter(AuditFilter.java:67)\n\tat com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)\n\tat com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)\n\tat com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)\n\tat com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)\n\tat com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)\n\tat com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:499)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat java.lang.Thread.run(Thread.java:745)\n"}

Want to create

Table :abcd

Database:default

Below is the REST api used to create hive table entity

curl -X POST -H "Content-Type: application/json" -u admin:admin -d '{"definition":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference","id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id","id":"­1466683608564093000","version":0,"typeName":"hive_table"},"typeName":"hive_table","values":{ "tableType":"MANAGED_TABLE","name":"http://<host>/api/atlas/entities

avatar
Super Collaborator

Hi Abdelkrim,

now I am able to create hive table entities and successfully linked those entities also using atlas REST api.

Please follow the step from below link:

https://community.hortonworks.com/questions/74875/how-to-create-hive-table-entity-in-apache-atlas-us...

avatar
Super Collaborator

Hi Abdelkrim,

I am trying to create altas hive table entity in altas as per the REST api document but facing below issue

Error:

{"error":"Unable to deserialize json","stackTrace":"java.lang.IllegalArgumentException: Unable to deserialize json\n\tat org.apache.atlas.services.DefaultMetadataService.deserializeClassInstances(DefaultMetadataService.java:313)\n\tat org.apache.atlas.services.DefaultMetadataService.createEntities(DefaultMetadataService.java:278)\n\tat org.apache.atlas.web.resources.EntityResource.submit(EntityResource.java:114)\n\tat sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:497)\n\tat com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)\n\tat com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)\n\tat com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)\n\tat com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)\n\tat com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)\n\tat com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)\n\tat com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)\n\tat com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)\n\tat com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)\n\tat com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)\n\tat com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)\n\tat com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)\n\tat com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)\n\tat com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)\n\tat org.apache.atlas.web.filters.AuditFilter.doFilter(AuditFilter.java:67)\n\tat com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)\n\tat com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)\n\tat com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)\n\tat com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)\n\tat com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)\n\tat com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:499)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat java.lang.Thread.run(Thread.java:745)\n"}

Want to create

Table :abcd

Database:default

Below is the REST api used to create hive table entity

curl -X POST -H "Content-Type: application/json" -u admin:admin -d '{"definition":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference","id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id","id":"­1466683608564093000","version":0,"typeName":"hive_table"},"typeName":"hive_table","values":{ "tableType":"MANAGED_TABLE","name":"http://<host>/api/atlas/entities