Created 12-28-2016 09:30 AM
Hi All,
I searched everywhere on internet but I don't find anywhere example to create "hive table" entity using REST API.Here the problem is that,I am very much confused on creating json body for REST api call.
Please send complete REST API call example with curl and json body to create hive table entity?
and also please send example to create lineage link between two datasets in Apache atlas?
Created on 12-28-2016 09:43 AM - edited 08-18-2019 03:27 AM
Hive table entity can be created using /atlas/api/entites REST call.
One such example is:
Step1: JSON for creating table1:
[{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425525", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } },{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" }, "typeName":"hive_table", "values":{ "tableType":"MANAGED_TABLE", "name":"table1", "createTime":"2016-12-28T09:34:53.000Z", "temporary":false, "db":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425525", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } }, "retention":0, "qualifiedName":"default.table1@cl1", "columns":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425522", "version":0, "typeName":"hive_column", "state":"ACTIVE" }, "typeName":"hive_column", "values":{ "name":"abc", "qualifiedName":"default.table1.abc@cl1", "owner":"hive", "type":"string", "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } } ], "lastAccessTime":"2016-12-28T09:34:53.000Z", "owner":"hive", "sd":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425523", "version":0, "typeName":"hive_storagedesc", "state":"ACTIVE" }, "typeName":"hive_storagedesc", "values":{ "location":"hdfs://mycluster/apps/hive/warehouse/table1", "serdeInfo":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct", "typeName":"hive_serde", "values":{ "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "parameters":{ "serialization.format":"1" } } }, "qualifiedName":"default.table1@cl1_storage", "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "compressed":false, "numBuckets":-1, "inputFormat":"org.apache.hadoop.mapred.TextInputFormat", "parameters":{ }, "storedAsSubDirectories":false, "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } }, "parameters":{ "rawDataSize":"0", "numFiles":"0", "transient_lastDdlTime":"1482917693", "totalSize":"0", "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}", "numRows":"0" }, "partitionKeys":[ ] }, "traitNames":[ ], "traits":{ } }]
Save the above json to a file.
Step2: REST API call to create the hive table entity.
curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json; charset=UTF-8' -u admin:admin -d @sample.json http://<IP_ADDRESS>:21000/api/atlas/entities
The above will help in creating a hive table entity.
Step3: JSON for creating table2:
[{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425525", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } },{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" }, "typeName":"hive_table", "values":{ "tableType":"MANAGED_TABLE", "name":"table2", "createTime":"2016-12-28T09:34:53.000Z", "temporary":false, "db":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425525", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } }, "retention":0, "qualifiedName":"default.table2@cl1", "columns":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425522", "version":0, "typeName":"hive_column", "state":"ACTIVE" }, "typeName":"hive_column", "values":{ "name":"abc", "qualifiedName":"default.table2.abc@cl1", "owner":"hive", "type":"string", "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } } ], "lastAccessTime":"2016-12-28T09:34:53.000Z", "owner":"hive", "sd":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425523", "version":0, "typeName":"hive_storagedesc", "state":"ACTIVE" }, "typeName":"hive_storagedesc", "values":{ "location":"hdfs://mycluster/apps/hive/warehouse/table2", "serdeInfo":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct", "typeName":"hive_serde", "values":{ "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "parameters":{ "serialization.format":"1" } } }, "qualifiedName":"default.table2@cl1_storage", "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "compressed":false, "numBuckets":-1, "inputFormat":"org.apache.hadoop.mapred.TextInputFormat", "parameters":{ }, "storedAsSubDirectories":false, "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } }, "parameters":{ "rawDataSize":"0", "numFiles":"0", "transient_lastDdlTime":"1482917693", "totalSize":"0", "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}", "numRows":"0" }, "partitionKeys":[ ] }, "traitNames":[ ], "traits":{ } }]
Save the above json to a file.
Step4: Repeat step2 with step3 json
Step5: JSON to create lineage between above two hive tables:
[{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425513", "version":0, "typeName":"hive_process", "state":"ACTIVE" }, "typeName":"hive_process", "values":{ "queryId":"hive_20161228094619_81b13647-4f7f-4f1b-9c08-0f64eb8dbb34", "name":"create table table2 as select * from table1", "startTime":"2016-12-28T09:46:19.003Z", "queryPlan":{ }, "operationType":"CREATETABLE_AS_SELECT", "outputs":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425516", "version":0, "typeName":"hive_table", "state":"ACTIVE" }, "typeName":"hive_table", "values":{ "tableType":"MANAGED_TABLE", "name":"table2", "createTime":"2016-12-28T09:46:30.000Z", "temporary":false, "db":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425517", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } }, "retention":0, "qualifiedName":"default.table2@cl1", "columns":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425514", "version":0, "typeName":"hive_column", "state":"ACTIVE" }, "typeName":"hive_column", "values":{ "name":"abc", "qualifiedName":"default.table2.abc@cl1", "owner":"hive", "type":"string", "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425516", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } } ], "lastAccessTime":"2016-12-28T09:46:30.000Z", "owner":"hive", "sd":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425515", "version":0, "typeName":"hive_storagedesc", "state":"ACTIVE" }, "typeName":"hive_storagedesc", "values":{ "location":"hdfs://mycluster/apps/hive/warehouse/table2", "serdeInfo":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct", "typeName":"hive_serde", "values":{ "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "parameters":{ "serialization.format":"1" } } }, "qualifiedName":"default.table2@cl1_storage", "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "compressed":false, "numBuckets":-1, "inputFormat":"org.apache.hadoop.mapred.TextInputFormat", "parameters":{ }, "storedAsSubDirectories":false, "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425516", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } }, "parameters":{ "rawDataSize":"0", "numFiles":"0", "transient_lastDdlTime":"1482918390", "totalSize":"0", "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}", "numRows":"0" }, "partitionKeys":[ ] }, "traitNames":[ ], "traits":{ } } ], "endTime":"2016-12-28T09:46:31.211Z", "recentQueries":[ "create table table2 as select * from table1" ], "inputs":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425520", "version":0, "typeName":"hive_table", "state":"ACTIVE" }, "typeName":"hive_table", "values":{ "tableType":"MANAGED_TABLE", "name":"table1", "createTime":"2016-12-28T09:34:53.000Z", "temporary":false, "db":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425521", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } }, "retention":0, "qualifiedName":"default.table1@cl1", "columns":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425518", "version":0, "typeName":"hive_column", "state":"ACTIVE" }, "typeName":"hive_column", "values":{ "name":"abc", "qualifiedName":"default.table1.abc@cl1", "owner":"hive", "type":"string", "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425520", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } } ], "lastAccessTime":"2016-12-28T09:34:53.000Z", "owner":"hive", "sd":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425519", "version":0, "typeName":"hive_storagedesc", "state":"ACTIVE" }, "typeName":"hive_storagedesc", "values":{ "location":"hdfs://mycluster/apps/hive/warehouse/table1", "serdeInfo":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct", "typeName":"hive_serde", "values":{ "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "parameters":{ "serialization.format":"1" } } }, "qualifiedName":"default.table1@cl1_storage", "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "compressed":false, "numBuckets":-1, "inputFormat":"org.apache.hadoop.mapred.TextInputFormat", "parameters":{ }, "storedAsSubDirectories":false, "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425520", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } }, "parameters":{ "rawDataSize":"0", "numFiles":"0", "transient_lastDdlTime":"1482917693", "totalSize":"0", "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}", "numRows":"0" }, "partitionKeys":[ ] }, "traitNames":[ ], "traits":{ } } ], "qualifiedName":"default.table2@cl1:1482918390000", "queryText":"create table table2 as select * from table1", "clusterName":"cl1", "userName":"hive" }, "traitNames":[ ], "traits":{ } }]
Save the above json to a file.
Step6: Repeat step2 with step5 json
Step7: You should be able to visualize the lineage between two entities.
The curl call will be same as the above.
Created 12-29-2016 11:36 AM
Hi Ayub,
I am able to set lineage between table1 and table2 successfully but now my requirement like,
Consider,I already have created hive table using hive query, it's metadata is also present in altas and I want to link or create lineage between this already created table and the one which i will going to create using REST API,to do this
what changes I need to make in json file which we are using to create hive_process?
which one is that property, you have set in json file because of it we can link table1 and table2?
Created 12-28-2016 12:28 PM
@Manoj Dhake Which HDP version are you using? This JSON would work with HDP-2.5.x release.
Created 12-29-2016 11:22 AM
@Manoj Dhake Currently the hive process json links table1 and table2. For creating lineage between table2 and table3: in the json change table1 references to table2 and table2 references to table3 and submit the json.
This should create lineage like table1-->table2-->table3
Created 12-28-2016 11:59 AM
As I was seeing frequent questions on REST API usage to create entity and lineage I have posted it as an HCC article.
Created 12-30-2016 04:20 AM
I have working atlas api examples here
Created 05-23-2017 08:31 AM
Please first validate your JSON using JSON Formatter and JSON Validator.