Created 12-27-2016 11:11 AM
lineage.pngHi All,
I have downloaded HDP 2.5 sandbox,we know that this sandbox has latest Apache altas version and it's already configured with HBASE for metadata storage of all entities(such as columns,tables and metadata about lineage).
Assume I am already connected to HBASE shell.
Hive tables:
Input: patient_info_raw
output : patient_validated_dataset
First of all my first question is :
1) can we add some additional metadata information for any entity in HBASE storage?
2) How to see Atlas metadata information from HBASE?
Problem Statement:
Consider there are two hive tables named as "patient_info_raw" and "patient_validated_dataset" as show in attached diagram.Out of this tables one table is been already created(i.e. patient_info_raw) and it's metadata is also present in HBASE,so my requirement it that i want to link (or show lineage in atlas UI) this table to "patient_validated_dataset" table just by inserting metadata in HBASE storage.Here, I do not want to execute any hive query(such as CREATE TABLE AS SELECT....,CREATE TABLE <tablename>) on input table(i.e. patient_info_raw).
Lineage must be reflected in atlas UI just by inserting lineage metadata in HBASE tables to create link between these two tables.
We have two tables in HBASE
1) ATLAS_ENTITY_AUDIT_EVENTS
2) atlas_titan
can we do above task in Apache Atlas if yes, then what are the steps to complete it?
Please keep in mind that i am not going to create the output table by executing hive query,Atlas should show metadata information,lineage of output table in atlas UI just by metadata insertion?
Created 12-27-2016 12:10 PM
Hi @Manoj Dhake
I've never tried to implement your use case by this should be possible using Atlas API. I do not recommend altering data directly in HBase.
You can follow these steps:
I hope this will help you implement your use case. My advice is to read the Atlas Rest API before implementing this https://atlas.incubator.apache.org/AtlasTechnicalUserGuide.pdf
Abdelkrim
Created 12-27-2016 12:10 PM
Hi @Manoj Dhake
I've never tried to implement your use case by this should be possible using Atlas API. I do not recommend altering data directly in HBase.
You can follow these steps:
I hope this will help you implement your use case. My advice is to read the Atlas Rest API before implementing this https://atlas.incubator.apache.org/AtlasTechnicalUserGuide.pdf
Abdelkrim
Created 12-27-2016 02:43 PM
Hi Abdelkarim,
Thanks you for reply.
According to you i should use altas api to create entities and link bdtween two tabels,yes we can do like this way.
But can we directely interact with hbase database and store metadata ?
Do you have any document on how to store in hbase and what are the other dependancies are required while storing data?
Created 12-27-2016 02:52 PM
As with any tools, modifying the database directly is dangerous and can lead to inconsistency. For instance, some operations needs to create/modify several data. If you modify data directly, you can miss one step on the road. Also, Atlas uses an index store (Solr) in addition of the metadata store (HBase). This index should be up to date and contains the last information which you can not guarantee when accessing the database.
Atlas comes with integration points that have been developed especially to let you enrich your data governance and customize your management. These integrations points are the secure path to implement your logic:
Created 12-27-2016 04:38 PM
Could you please post the rest api example to create hive table entity and set lineage link between two tables?
Created 12-27-2016 04:59 PM
I don't have a ready example for your use case. Look to the documentation I gave you, you can find an example with HBase. You just need to adapt it to your needs. Hope this helps.
Created 12-28-2016 07:11 AM
Hi Abdelkrim,
I am trying to create altas hive table entity in altas as per the REST api document but facing below issue
Error:
{"error":"Unable to deserialize json","stackTrace":"java.lang.IllegalArgumentException: Unable to deserialize json\n\tat org.apache.atlas.services.DefaultMetadataService.deserializeClassInstances(DefaultMetadataService.java:313)\n\tat org.apache.atlas.services.DefaultMetadataService.createEntities(DefaultMetadataService.java:278)\n\tat org.apache.atlas.web.resources.EntityResource.submit(EntityResource.java:114)\n\tat sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:497)\n\tat com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)\n\tat com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)\n\tat com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)\n\tat com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)\n\tat com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)\n\tat com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)\n\tat com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)\n\tat com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)\n\tat com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)\n\tat com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)\n\tat com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)\n\tat com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)\n\tat com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)\n\tat com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)\n\tat org.apache.atlas.web.filters.AuditFilter.doFilter(AuditFilter.java:67)\n\tat com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)\n\tat com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)\n\tat com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)\n\tat com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)\n\tat com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)\n\tat com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:499)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat java.lang.Thread.run(Thread.java:745)\n"}
Want to create
Table :abcd
Database:default
Below is the REST api used to create hive table entity
curl -X POST -H "Content-Type: application/json" -u admin:admin -d '{"definition":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference","id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id","id":"1466683608564093000","version":0,"typeName":"hive_table"},"typeName":"hive_table","values":{ "tableType":"MANAGED_TABLE","name":"http://<host>/api/atlas/entities
Created 12-29-2016 05:54 AM
Hi Abdelkrim,
now I am able to create hive table entities and successfully linked those entities also using atlas REST api.
Please follow the step from below link:
Created 12-28-2016 05:49 AM
Hi Abdelkrim,
I am trying to create altas hive table entity in altas as per the REST api document but facing below issue
Error:
{"error":"Unable to deserialize json","stackTrace":"java.lang.IllegalArgumentException: Unable to deserialize json\n\tat org.apache.atlas.services.DefaultMetadataService.deserializeClassInstances(DefaultMetadataService.java:313)\n\tat org.apache.atlas.services.DefaultMetadataService.createEntities(DefaultMetadataService.java:278)\n\tat org.apache.atlas.web.resources.EntityResource.submit(EntityResource.java:114)\n\tat sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:497)\n\tat com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)\n\tat com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)\n\tat com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)\n\tat com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)\n\tat com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)\n\tat com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)\n\tat com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)\n\tat com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)\n\tat com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)\n\tat com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)\n\tat com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)\n\tat com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)\n\tat com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)\n\tat com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)\n\tat org.apache.atlas.web.filters.AuditFilter.doFilter(AuditFilter.java:67)\n\tat com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)\n\tat com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)\n\tat com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)\n\tat com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)\n\tat com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)\n\tat com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:499)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat java.lang.Thread.run(Thread.java:745)\n"}
Want to create
Table :abcd
Database:default
Below is the REST api used to create hive table entity
curl -X POST -H "Content-Type: application/json" -u admin:admin -d '{"definition":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference","id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id","id":"1466683608564093000","version":0,"typeName":"hive_table"},"typeName":"hive_table","values":{ "tableType":"MANAGED_TABLE","name":"http://<host>/api/atlas/entities