Created 12-28-2016 09:30 AM
Hi All,
I searched everywhere on internet but I don't find anywhere example to create "hive table" entity using REST API.Here the problem is that,I am very much confused on creating json body for REST api call.
Please send complete REST API call example with curl and json body to create hive table entity?
and also please send example to create lineage link between two datasets in Apache atlas?
Created on 12-28-2016 09:43 AM - edited 08-18-2019 03:27 AM
Hive table entity can be created using /atlas/api/entites REST call.
One such example is:
Step1: JSON for creating table1:
[{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425525", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } },{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" }, "typeName":"hive_table", "values":{ "tableType":"MANAGED_TABLE", "name":"table1", "createTime":"2016-12-28T09:34:53.000Z", "temporary":false, "db":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425525", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } }, "retention":0, "qualifiedName":"default.table1@cl1", "columns":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425522", "version":0, "typeName":"hive_column", "state":"ACTIVE" }, "typeName":"hive_column", "values":{ "name":"abc", "qualifiedName":"default.table1.abc@cl1", "owner":"hive", "type":"string", "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } } ], "lastAccessTime":"2016-12-28T09:34:53.000Z", "owner":"hive", "sd":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425523", "version":0, "typeName":"hive_storagedesc", "state":"ACTIVE" }, "typeName":"hive_storagedesc", "values":{ "location":"hdfs://mycluster/apps/hive/warehouse/table1", "serdeInfo":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct", "typeName":"hive_serde", "values":{ "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "parameters":{ "serialization.format":"1" } } }, "qualifiedName":"default.table1@cl1_storage", "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "compressed":false, "numBuckets":-1, "inputFormat":"org.apache.hadoop.mapred.TextInputFormat", "parameters":{ }, "storedAsSubDirectories":false, "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } }, "parameters":{ "rawDataSize":"0", "numFiles":"0", "transient_lastDdlTime":"1482917693", "totalSize":"0", "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}", "numRows":"0" }, "partitionKeys":[ ] }, "traitNames":[ ], "traits":{ } }]
Save the above json to a file.
Step2: REST API call to create the hive table entity.
curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json; charset=UTF-8' -u admin:admin -d @sample.json http://<IP_ADDRESS>:21000/api/atlas/entities
The above will help in creating a hive table entity.
Step3: JSON for creating table2:
[{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425525", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } },{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" }, "typeName":"hive_table", "values":{ "tableType":"MANAGED_TABLE", "name":"table2", "createTime":"2016-12-28T09:34:53.000Z", "temporary":false, "db":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425525", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } }, "retention":0, "qualifiedName":"default.table2@cl1", "columns":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425522", "version":0, "typeName":"hive_column", "state":"ACTIVE" }, "typeName":"hive_column", "values":{ "name":"abc", "qualifiedName":"default.table2.abc@cl1", "owner":"hive", "type":"string", "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } } ], "lastAccessTime":"2016-12-28T09:34:53.000Z", "owner":"hive", "sd":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425523", "version":0, "typeName":"hive_storagedesc", "state":"ACTIVE" }, "typeName":"hive_storagedesc", "values":{ "location":"hdfs://mycluster/apps/hive/warehouse/table2", "serdeInfo":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct", "typeName":"hive_serde", "values":{ "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "parameters":{ "serialization.format":"1" } } }, "qualifiedName":"default.table2@cl1_storage", "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "compressed":false, "numBuckets":-1, "inputFormat":"org.apache.hadoop.mapred.TextInputFormat", "parameters":{ }, "storedAsSubDirectories":false, "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } }, "parameters":{ "rawDataSize":"0", "numFiles":"0", "transient_lastDdlTime":"1482917693", "totalSize":"0", "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}", "numRows":"0" }, "partitionKeys":[ ] }, "traitNames":[ ], "traits":{ } }]
Save the above json to a file.
Step4: Repeat step2 with step3 json
Step5: JSON to create lineage between above two hive tables:
[{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425513", "version":0, "typeName":"hive_process", "state":"ACTIVE" }, "typeName":"hive_process", "values":{ "queryId":"hive_20161228094619_81b13647-4f7f-4f1b-9c08-0f64eb8dbb34", "name":"create table table2 as select * from table1", "startTime":"2016-12-28T09:46:19.003Z", "queryPlan":{ }, "operationType":"CREATETABLE_AS_SELECT", "outputs":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425516", "version":0, "typeName":"hive_table", "state":"ACTIVE" }, "typeName":"hive_table", "values":{ "tableType":"MANAGED_TABLE", "name":"table2", "createTime":"2016-12-28T09:46:30.000Z", "temporary":false, "db":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425517", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } }, "retention":0, "qualifiedName":"default.table2@cl1", "columns":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425514", "version":0, "typeName":"hive_column", "state":"ACTIVE" }, "typeName":"hive_column", "values":{ "name":"abc", "qualifiedName":"default.table2.abc@cl1", "owner":"hive", "type":"string", "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425516", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } } ], "lastAccessTime":"2016-12-28T09:46:30.000Z", "owner":"hive", "sd":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425515", "version":0, "typeName":"hive_storagedesc", "state":"ACTIVE" }, "typeName":"hive_storagedesc", "values":{ "location":"hdfs://mycluster/apps/hive/warehouse/table2", "serdeInfo":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct", "typeName":"hive_serde", "values":{ "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "parameters":{ "serialization.format":"1" } } }, "qualifiedName":"default.table2@cl1_storage", "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "compressed":false, "numBuckets":-1, "inputFormat":"org.apache.hadoop.mapred.TextInputFormat", "parameters":{ }, "storedAsSubDirectories":false, "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425516", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } }, "parameters":{ "rawDataSize":"0", "numFiles":"0", "transient_lastDdlTime":"1482918390", "totalSize":"0", "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}", "numRows":"0" }, "partitionKeys":[ ] }, "traitNames":[ ], "traits":{ } } ], "endTime":"2016-12-28T09:46:31.211Z", "recentQueries":[ "create table table2 as select * from table1" ], "inputs":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425520", "version":0, "typeName":"hive_table", "state":"ACTIVE" }, "typeName":"hive_table", "values":{ "tableType":"MANAGED_TABLE", "name":"table1", "createTime":"2016-12-28T09:34:53.000Z", "temporary":false, "db":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425521", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } }, "retention":0, "qualifiedName":"default.table1@cl1", "columns":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425518", "version":0, "typeName":"hive_column", "state":"ACTIVE" }, "typeName":"hive_column", "values":{ "name":"abc", "qualifiedName":"default.table1.abc@cl1", "owner":"hive", "type":"string", "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425520", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } } ], "lastAccessTime":"2016-12-28T09:34:53.000Z", "owner":"hive", "sd":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425519", "version":0, "typeName":"hive_storagedesc", "state":"ACTIVE" }, "typeName":"hive_storagedesc", "values":{ "location":"hdfs://mycluster/apps/hive/warehouse/table1", "serdeInfo":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct", "typeName":"hive_serde", "values":{ "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "parameters":{ "serialization.format":"1" } } }, "qualifiedName":"default.table1@cl1_storage", "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "compressed":false, "numBuckets":-1, "inputFormat":"org.apache.hadoop.mapred.TextInputFormat", "parameters":{ }, "storedAsSubDirectories":false, "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425520", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } }, "parameters":{ "rawDataSize":"0", "numFiles":"0", "transient_lastDdlTime":"1482917693", "totalSize":"0", "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}", "numRows":"0" }, "partitionKeys":[ ] }, "traitNames":[ ], "traits":{ } } ], "qualifiedName":"default.table2@cl1:1482918390000", "queryText":"create table table2 as select * from table1", "clusterName":"cl1", "userName":"hive" }, "traitNames":[ ], "traits":{ } }]
Save the above json to a file.
Step6: Repeat step2 with step5 json
Step7: You should be able to visualize the lineage between two entities.
The curl call will be same as the above.
Created on 12-28-2016 09:43 AM - edited 08-18-2019 03:27 AM
Hive table entity can be created using /atlas/api/entites REST call.
One such example is:
Step1: JSON for creating table1:
[{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425525", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } },{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" }, "typeName":"hive_table", "values":{ "tableType":"MANAGED_TABLE", "name":"table1", "createTime":"2016-12-28T09:34:53.000Z", "temporary":false, "db":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425525", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } }, "retention":0, "qualifiedName":"default.table1@cl1", "columns":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425522", "version":0, "typeName":"hive_column", "state":"ACTIVE" }, "typeName":"hive_column", "values":{ "name":"abc", "qualifiedName":"default.table1.abc@cl1", "owner":"hive", "type":"string", "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } } ], "lastAccessTime":"2016-12-28T09:34:53.000Z", "owner":"hive", "sd":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425523", "version":0, "typeName":"hive_storagedesc", "state":"ACTIVE" }, "typeName":"hive_storagedesc", "values":{ "location":"hdfs://mycluster/apps/hive/warehouse/table1", "serdeInfo":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct", "typeName":"hive_serde", "values":{ "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "parameters":{ "serialization.format":"1" } } }, "qualifiedName":"default.table1@cl1_storage", "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "compressed":false, "numBuckets":-1, "inputFormat":"org.apache.hadoop.mapred.TextInputFormat", "parameters":{ }, "storedAsSubDirectories":false, "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } }, "parameters":{ "rawDataSize":"0", "numFiles":"0", "transient_lastDdlTime":"1482917693", "totalSize":"0", "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}", "numRows":"0" }, "partitionKeys":[ ] }, "traitNames":[ ], "traits":{ } }]
Save the above json to a file.
Step2: REST API call to create the hive table entity.
curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json; charset=UTF-8' -u admin:admin -d @sample.json http://<IP_ADDRESS>:21000/api/atlas/entities
The above will help in creating a hive table entity.
Step3: JSON for creating table2:
[{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425525", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } },{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" }, "typeName":"hive_table", "values":{ "tableType":"MANAGED_TABLE", "name":"table2", "createTime":"2016-12-28T09:34:53.000Z", "temporary":false, "db":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425525", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } }, "retention":0, "qualifiedName":"default.table2@cl1", "columns":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425522", "version":0, "typeName":"hive_column", "state":"ACTIVE" }, "typeName":"hive_column", "values":{ "name":"abc", "qualifiedName":"default.table2.abc@cl1", "owner":"hive", "type":"string", "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } } ], "lastAccessTime":"2016-12-28T09:34:53.000Z", "owner":"hive", "sd":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425523", "version":0, "typeName":"hive_storagedesc", "state":"ACTIVE" }, "typeName":"hive_storagedesc", "values":{ "location":"hdfs://mycluster/apps/hive/warehouse/table2", "serdeInfo":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct", "typeName":"hive_serde", "values":{ "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "parameters":{ "serialization.format":"1" } } }, "qualifiedName":"default.table2@cl1_storage", "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "compressed":false, "numBuckets":-1, "inputFormat":"org.apache.hadoop.mapred.TextInputFormat", "parameters":{ }, "storedAsSubDirectories":false, "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425524", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } }, "parameters":{ "rawDataSize":"0", "numFiles":"0", "transient_lastDdlTime":"1482917693", "totalSize":"0", "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}", "numRows":"0" }, "partitionKeys":[ ] }, "traitNames":[ ], "traits":{ } }]
Save the above json to a file.
Step4: Repeat step2 with step3 json
Step5: JSON to create lineage between above two hive tables:
[{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425513", "version":0, "typeName":"hive_process", "state":"ACTIVE" }, "typeName":"hive_process", "values":{ "queryId":"hive_20161228094619_81b13647-4f7f-4f1b-9c08-0f64eb8dbb34", "name":"create table table2 as select * from table1", "startTime":"2016-12-28T09:46:19.003Z", "queryPlan":{ }, "operationType":"CREATETABLE_AS_SELECT", "outputs":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425516", "version":0, "typeName":"hive_table", "state":"ACTIVE" }, "typeName":"hive_table", "values":{ "tableType":"MANAGED_TABLE", "name":"table2", "createTime":"2016-12-28T09:46:30.000Z", "temporary":false, "db":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425517", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } }, "retention":0, "qualifiedName":"default.table2@cl1", "columns":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425514", "version":0, "typeName":"hive_column", "state":"ACTIVE" }, "typeName":"hive_column", "values":{ "name":"abc", "qualifiedName":"default.table2.abc@cl1", "owner":"hive", "type":"string", "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425516", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } } ], "lastAccessTime":"2016-12-28T09:46:30.000Z", "owner":"hive", "sd":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425515", "version":0, "typeName":"hive_storagedesc", "state":"ACTIVE" }, "typeName":"hive_storagedesc", "values":{ "location":"hdfs://mycluster/apps/hive/warehouse/table2", "serdeInfo":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct", "typeName":"hive_serde", "values":{ "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "parameters":{ "serialization.format":"1" } } }, "qualifiedName":"default.table2@cl1_storage", "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "compressed":false, "numBuckets":-1, "inputFormat":"org.apache.hadoop.mapred.TextInputFormat", "parameters":{ }, "storedAsSubDirectories":false, "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425516", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } }, "parameters":{ "rawDataSize":"0", "numFiles":"0", "transient_lastDdlTime":"1482918390", "totalSize":"0", "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}", "numRows":"0" }, "partitionKeys":[ ] }, "traitNames":[ ], "traits":{ } } ], "endTime":"2016-12-28T09:46:31.211Z", "recentQueries":[ "create table table2 as select * from table1" ], "inputs":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425520", "version":0, "typeName":"hive_table", "state":"ACTIVE" }, "typeName":"hive_table", "values":{ "tableType":"MANAGED_TABLE", "name":"table1", "createTime":"2016-12-28T09:34:53.000Z", "temporary":false, "db":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425521", "version":0, "typeName":"hive_db", "state":"ACTIVE" }, "typeName":"hive_db", "values":{ "name":"default", "location":"hdfs://mycluster/apps/hive/warehouse", "description":"Default Hive database", "ownerType":2, "qualifiedName":"default@cl1", "owner":"public", "clusterName":"cl1", "parameters":{ } }, "traitNames":[ ], "traits":{ } }, "retention":0, "qualifiedName":"default.table1@cl1", "columns":[ { "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425518", "version":0, "typeName":"hive_column", "state":"ACTIVE" }, "typeName":"hive_column", "values":{ "name":"abc", "qualifiedName":"default.table1.abc@cl1", "owner":"hive", "type":"string", "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425520", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } } ], "lastAccessTime":"2016-12-28T09:34:53.000Z", "owner":"hive", "sd":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", "id":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425519", "version":0, "typeName":"hive_storagedesc", "state":"ACTIVE" }, "typeName":"hive_storagedesc", "values":{ "location":"hdfs://mycluster/apps/hive/warehouse/table1", "serdeInfo":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct", "typeName":"hive_serde", "values":{ "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "parameters":{ "serialization.format":"1" } } }, "qualifiedName":"default.table1@cl1_storage", "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "compressed":false, "numBuckets":-1, "inputFormat":"org.apache.hadoop.mapred.TextInputFormat", "parameters":{ }, "storedAsSubDirectories":false, "table":{ "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id", "id":"-11893021824425520", "version":0, "typeName":"hive_table", "state":"ACTIVE" } }, "traitNames":[ ], "traits":{ } }, "parameters":{ "rawDataSize":"0", "numFiles":"0", "transient_lastDdlTime":"1482917693", "totalSize":"0", "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}", "numRows":"0" }, "partitionKeys":[ ] }, "traitNames":[ ], "traits":{ } } ], "qualifiedName":"default.table2@cl1:1482918390000", "queryText":"create table table2 as select * from table1", "clusterName":"cl1", "userName":"hive" }, "traitNames":[ ], "traits":{ } }]
Save the above json to a file.
Step6: Repeat step2 with step5 json
Step7: You should be able to visualize the lineage between two entities.
The curl call will be same as the above.
Created 12-28-2016 10:04 AM
Thank you Ayub,
Is above json structure is only for creating hive table entity?
Consider my database is already created and now I just need to create hive table entity
Created 12-28-2016 10:55 AM
Hi Ayub,
If we paste the above json data for creating hive entity in json validator there I am getting error as "multiple json root element".
Json Validator url:
https://jsonformatter.curiousconcept.com/
I think you have sent wrong json structure.
Created 12-28-2016 11:24 AM
@Manoj Dhake I have updated the answer with more details, please check and let me know if it works.
This time I have validated the json structure 🙂
Created 12-28-2016 12:13 PM
Thank you for reply Ayub,
I am trying to create entity using above json and within json I just have changed "mycluster" and "cl1" with my own cluster values but getting below error:
{"error":"For field 'tableName'","stackTrace":"org.apache.atlas.typesystem.types.ValueConversionException$NullConversionException: For field 'tableName'
Created 12-28-2016 02:59 PM
Ok i was using hdp2.4 hdp sandbox ,so i will try this json on HDP 2.5
Created 12-29-2016 05:50 AM
Thank you Ayub,
I checked your json on HDP 2.5 and it's working fine their.
Created 12-29-2016 10:35 AM
Hi Ayub,
As we have created two dataset entities and set the lineage between them also,now my requirement is like ,
Consider I have already created hive table using hive query(i.e. patient_info_raw), it's metadata is also present in atlas repository and now I want to create lineage between this existing dataset and the one which I will create by using POST api (i.e. patient_validated_info).
so what changes I need to make in json file of lineage data (i.e. in 3rd step)? so that I can see the lineage
I can create third table(i.e. hive_entity) by using same json file that is fine but what about json data for lineage?
How can I link them from patient_info_raw--->patient_validated_info.
Created 12-29-2016 11:16 AM
Hi Ayub,
As we have created two dataset entities and set the lineage between them also.
Consider I have already created hive table(i.e .patient_raw_info) and it's metadata is also present in atlas and now I want to create lineage between already exist dataset(i.e. patient_raw_info) and the one which I will going to create by using your REST API (i.e. patient_validated_dataset) so my question is
How can I create hive_process between already exist dataset and the other one?
what changes I need to make in json file which we are using to create hive_process (i.e. lineage) ?
I can create third table(i.e. hive_entity) by using same json file that is fine but what about json data for lineage?
How can I link them from,
patient_raw_info--->patient_validated_dataset