Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to create hive table entity in Apache atlas using REST API?

avatar
Super Collaborator

Hi All,

I searched everywhere on internet but I don't find anywhere example to create "hive table" entity using REST API.Here the problem is that,I am very much confused on creating json body for REST api call.

Please send complete REST API call example with curl and json body to create hive table entity?

and also please send example to create lineage link between two datasets in Apache atlas?

1 ACCEPTED SOLUTION

avatar

@Manoj Dhake

Hive table entity can be created using /atlas/api/entites REST call.

One such example is:

Step1: JSON for creating table1:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425525",
    "version":0,
    "typeName":"hive_db",
    "state":"ACTIVE"
  },
  "typeName":"hive_db",
  "values":{
    "name":"default",
    "location":"hdfs://mycluster/apps/hive/warehouse",
    "description":"Default Hive database",
    "ownerType":2,
    "qualifiedName":"default@cl1",
    "owner":"public",
    "clusterName":"cl1",
    "parameters":{
 
 
    }
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
},{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425524",
    "version":0,
    "typeName":"hive_table",
    "state":"ACTIVE"
  },
  "typeName":"hive_table",
  "values":{
    "tableType":"MANAGED_TABLE",
    "name":"table1",
    "createTime":"2016-12-28T09:34:53.000Z",
    "temporary":false,
    "db":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425525",
        "version":0,
        "typeName":"hive_db",
        "state":"ACTIVE"
      },
      "typeName":"hive_db",
      "values":{
        "name":"default",
        "location":"hdfs://mycluster/apps/hive/warehouse",
        "description":"Default Hive database",
        "ownerType":2,
        "qualifiedName":"default@cl1",
        "owner":"public",
        "clusterName":"cl1",
        "parameters":{
 
 
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "retention":0,
    "qualifiedName":"default.table1@cl1",
    "columns":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425522",
          "version":0,
          "typeName":"hive_column",
          "state":"ACTIVE"
        },
        "typeName":"hive_column",
        "values":{
          "name":"abc",
          "qualifiedName":"default.table1.abc@cl1",
          "owner":"hive",
          "type":"string",
          "table":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
            "id":"-11893021824425524",
            "version":0,
            "typeName":"hive_table",
            "state":"ACTIVE"
          }
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "lastAccessTime":"2016-12-28T09:34:53.000Z",
    "owner":"hive",
    "sd":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425523",
        "version":0,
        "typeName":"hive_storagedesc",
        "state":"ACTIVE"
      },
      "typeName":"hive_storagedesc",
      "values":{
        "location":"hdfs://mycluster/apps/hive/warehouse/table1",
        "serdeInfo":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
          "typeName":"hive_serde",
          "values":{
            "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
            "parameters":{
              "serialization.format":"1"
            }
          }
        },
        "qualifiedName":"default.table1@cl1_storage",
        "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "compressed":false,
        "numBuckets":-1,
        "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
        "parameters":{
 
 
        },
        "storedAsSubDirectories":false,
        "table":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425524",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "parameters":{
      "rawDataSize":"0",
      "numFiles":"0",
      "transient_lastDdlTime":"1482917693",
      "totalSize":"0",
      "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
      "numRows":"0"
    },
    "partitionKeys":[
 
 
    ]
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step2: REST API call to create the hive table entity.

curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json;  charset=UTF-8' -u admin:admin -d @sample.json http://<IP_ADDRESS>:21000/api/atlas/entities

The above will help in creating a hive table entity.

Step3: JSON for creating table2:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425525",
    "version":0,
    "typeName":"hive_db",
    "state":"ACTIVE"
  },
  "typeName":"hive_db",
  "values":{
    "name":"default",
    "location":"hdfs://mycluster/apps/hive/warehouse",
    "description":"Default Hive database",
    "ownerType":2,
    "qualifiedName":"default@cl1",
    "owner":"public",
    "clusterName":"cl1",
    "parameters":{
 
 
    }
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
},{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425524",
    "version":0,
    "typeName":"hive_table",
    "state":"ACTIVE"
  },
  "typeName":"hive_table",
  "values":{
    "tableType":"MANAGED_TABLE",
    "name":"table2",
    "createTime":"2016-12-28T09:34:53.000Z",
    "temporary":false,
    "db":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425525",
        "version":0,
        "typeName":"hive_db",
        "state":"ACTIVE"
      },
      "typeName":"hive_db",
      "values":{
        "name":"default",
        "location":"hdfs://mycluster/apps/hive/warehouse",
        "description":"Default Hive database",
        "ownerType":2,
        "qualifiedName":"default@cl1",
        "owner":"public",
        "clusterName":"cl1",
        "parameters":{
 
 
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "retention":0,
    "qualifiedName":"default.table2@cl1",
    "columns":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425522",
          "version":0,
          "typeName":"hive_column",
          "state":"ACTIVE"
        },
        "typeName":"hive_column",
        "values":{
          "name":"abc",
          "qualifiedName":"default.table2.abc@cl1",
          "owner":"hive",
          "type":"string",
          "table":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
            "id":"-11893021824425524",
            "version":0,
            "typeName":"hive_table",
            "state":"ACTIVE"
          }
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "lastAccessTime":"2016-12-28T09:34:53.000Z",
    "owner":"hive",
    "sd":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425523",
        "version":0,
        "typeName":"hive_storagedesc",
        "state":"ACTIVE"
      },
      "typeName":"hive_storagedesc",
      "values":{
        "location":"hdfs://mycluster/apps/hive/warehouse/table2",
        "serdeInfo":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
          "typeName":"hive_serde",
          "values":{
            "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
            "parameters":{
              "serialization.format":"1"
            }
          }
        },
        "qualifiedName":"default.table2@cl1_storage",
        "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "compressed":false,
        "numBuckets":-1,
        "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
        "parameters":{
 
 
        },
        "storedAsSubDirectories":false,
        "table":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425524",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "parameters":{
      "rawDataSize":"0",
      "numFiles":"0",
      "transient_lastDdlTime":"1482917693",
      "totalSize":"0",
      "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
      "numRows":"0"
    },
    "partitionKeys":[
 
 
    ]
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step4: Repeat step2 with step3 json

Step5: JSON to create lineage between above two hive tables:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425513",
    "version":0,
    "typeName":"hive_process",
    "state":"ACTIVE"
  },
  "typeName":"hive_process",
  "values":{
    "queryId":"hive_20161228094619_81b13647-4f7f-4f1b-9c08-0f64eb8dbb34",
    "name":"create table table2 as select * from table1",
    "startTime":"2016-12-28T09:46:19.003Z",
    "queryPlan":{
 
 
    },
    "operationType":"CREATETABLE_AS_SELECT",
    "outputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425516",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table2",
          "createTime":"2016-12-28T09:46:30.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425517",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table2@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425514",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table2.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425516",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:46:30.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425515",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table2",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table2@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425516",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482918390",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "endTime":"2016-12-28T09:46:31.211Z",
    "recentQueries":[
      "create table table2 as select * from table1"
    ],
    "inputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425520",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table1",
          "createTime":"2016-12-28T09:34:53.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425521",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table1@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425518",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table1.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425520",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:34:53.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425519",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table1",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table1@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425520",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482917693",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "qualifiedName":"default.table2@cl1:1482918390000",
    "queryText":"create table table2 as select * from table1",
    "clusterName":"cl1",
    "userName":"hive"
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step6: Repeat step2 with step5 json

Step7: You should be able to visualize the lineage between two entities.

10864-atlas-2016-12-28-16-47-13.png

The curl call will be same as the above.

View solution in original post

15 REPLIES 15

avatar
Super Collaborator

Hi Ayub,

I am able to set lineage between table1 and table2 successfully but now my requirement like,

Consider,I already have created hive table using hive query, it's metadata is also present in altas and I want to link or create lineage between this already created table and the one which i will going to create using REST API,to do this

what changes I need to make in json file which we are using to create hive_process?

which one is that property, you have set in json file because of it we can link table1 and table2?

avatar

@Manoj Dhake Which HDP version are you using? This JSON would work with HDP-2.5.x release.

avatar

@Manoj Dhake Currently the hive process json links table1 and table2. For creating lineage between table2 and table3: in the json change table1 references to table2 and table2 references to table3 and submit the json.

This should create lineage like table1-->table2-->table3

avatar

As I was seeing frequent questions on REST API usage to create entity and lineage I have posted it as an HCC article.

https://community.hortonworks.com/content/kbentry/74919/how-to-create-hive-table-and-lineage-using-r...

avatar
Master Guru

I have working atlas api examples here

https://github.com/sunileman/Atlas-API-Examples

avatar
New Contributor

Please first validate your JSON using JSON Formatter and JSON Validator.