Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to create hive table entity in Apache atlas using REST API?

Solved Go to solution

How to create hive table entity in Apache atlas using REST API?

Rising Star

Hi All,

I searched everywhere on internet but I don't find anywhere example to create "hive table" entity using REST API.Here the problem is that,I am very much confused on creating json body for REST api call.

Please send complete REST API call example with curl and json body to create hive table entity?

and also please send example to create lineage link between two datasets in Apache atlas?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to create hive table entity in Apache atlas using REST API?

@Manoj Dhake

Hive table entity can be created using /atlas/api/entites REST call.

One such example is:

Step1: JSON for creating table1:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425525",
    "version":0,
    "typeName":"hive_db",
    "state":"ACTIVE"
  },
  "typeName":"hive_db",
  "values":{
    "name":"default",
    "location":"hdfs://mycluster/apps/hive/warehouse",
    "description":"Default Hive database",
    "ownerType":2,
    "qualifiedName":"default@cl1",
    "owner":"public",
    "clusterName":"cl1",
    "parameters":{
 
 
    }
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
},{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425524",
    "version":0,
    "typeName":"hive_table",
    "state":"ACTIVE"
  },
  "typeName":"hive_table",
  "values":{
    "tableType":"MANAGED_TABLE",
    "name":"table1",
    "createTime":"2016-12-28T09:34:53.000Z",
    "temporary":false,
    "db":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425525",
        "version":0,
        "typeName":"hive_db",
        "state":"ACTIVE"
      },
      "typeName":"hive_db",
      "values":{
        "name":"default",
        "location":"hdfs://mycluster/apps/hive/warehouse",
        "description":"Default Hive database",
        "ownerType":2,
        "qualifiedName":"default@cl1",
        "owner":"public",
        "clusterName":"cl1",
        "parameters":{
 
 
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "retention":0,
    "qualifiedName":"default.table1@cl1",
    "columns":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425522",
          "version":0,
          "typeName":"hive_column",
          "state":"ACTIVE"
        },
        "typeName":"hive_column",
        "values":{
          "name":"abc",
          "qualifiedName":"default.table1.abc@cl1",
          "owner":"hive",
          "type":"string",
          "table":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
            "id":"-11893021824425524",
            "version":0,
            "typeName":"hive_table",
            "state":"ACTIVE"
          }
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "lastAccessTime":"2016-12-28T09:34:53.000Z",
    "owner":"hive",
    "sd":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425523",
        "version":0,
        "typeName":"hive_storagedesc",
        "state":"ACTIVE"
      },
      "typeName":"hive_storagedesc",
      "values":{
        "location":"hdfs://mycluster/apps/hive/warehouse/table1",
        "serdeInfo":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
          "typeName":"hive_serde",
          "values":{
            "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
            "parameters":{
              "serialization.format":"1"
            }
          }
        },
        "qualifiedName":"default.table1@cl1_storage",
        "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "compressed":false,
        "numBuckets":-1,
        "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
        "parameters":{
 
 
        },
        "storedAsSubDirectories":false,
        "table":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425524",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "parameters":{
      "rawDataSize":"0",
      "numFiles":"0",
      "transient_lastDdlTime":"1482917693",
      "totalSize":"0",
      "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
      "numRows":"0"
    },
    "partitionKeys":[
 
 
    ]
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step2: REST API call to create the hive table entity.

curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json;  charset=UTF-8' -u admin:admin -d @sample.json http://<IP_ADDRESS>:21000/api/atlas/entities

The above will help in creating a hive table entity.

Step3: JSON for creating table2:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425525",
    "version":0,
    "typeName":"hive_db",
    "state":"ACTIVE"
  },
  "typeName":"hive_db",
  "values":{
    "name":"default",
    "location":"hdfs://mycluster/apps/hive/warehouse",
    "description":"Default Hive database",
    "ownerType":2,
    "qualifiedName":"default@cl1",
    "owner":"public",
    "clusterName":"cl1",
    "parameters":{
 
 
    }
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
},{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425524",
    "version":0,
    "typeName":"hive_table",
    "state":"ACTIVE"
  },
  "typeName":"hive_table",
  "values":{
    "tableType":"MANAGED_TABLE",
    "name":"table2",
    "createTime":"2016-12-28T09:34:53.000Z",
    "temporary":false,
    "db":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425525",
        "version":0,
        "typeName":"hive_db",
        "state":"ACTIVE"
      },
      "typeName":"hive_db",
      "values":{
        "name":"default",
        "location":"hdfs://mycluster/apps/hive/warehouse",
        "description":"Default Hive database",
        "ownerType":2,
        "qualifiedName":"default@cl1",
        "owner":"public",
        "clusterName":"cl1",
        "parameters":{
 
 
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "retention":0,
    "qualifiedName":"default.table2@cl1",
    "columns":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425522",
          "version":0,
          "typeName":"hive_column",
          "state":"ACTIVE"
        },
        "typeName":"hive_column",
        "values":{
          "name":"abc",
          "qualifiedName":"default.table2.abc@cl1",
          "owner":"hive",
          "type":"string",
          "table":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
            "id":"-11893021824425524",
            "version":0,
            "typeName":"hive_table",
            "state":"ACTIVE"
          }
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "lastAccessTime":"2016-12-28T09:34:53.000Z",
    "owner":"hive",
    "sd":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425523",
        "version":0,
        "typeName":"hive_storagedesc",
        "state":"ACTIVE"
      },
      "typeName":"hive_storagedesc",
      "values":{
        "location":"hdfs://mycluster/apps/hive/warehouse/table2",
        "serdeInfo":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
          "typeName":"hive_serde",
          "values":{
            "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
            "parameters":{
              "serialization.format":"1"
            }
          }
        },
        "qualifiedName":"default.table2@cl1_storage",
        "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "compressed":false,
        "numBuckets":-1,
        "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
        "parameters":{
 
 
        },
        "storedAsSubDirectories":false,
        "table":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425524",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "parameters":{
      "rawDataSize":"0",
      "numFiles":"0",
      "transient_lastDdlTime":"1482917693",
      "totalSize":"0",
      "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
      "numRows":"0"
    },
    "partitionKeys":[
 
 
    ]
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step4: Repeat step2 with step3 json

Step5: JSON to create lineage between above two hive tables:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425513",
    "version":0,
    "typeName":"hive_process",
    "state":"ACTIVE"
  },
  "typeName":"hive_process",
  "values":{
    "queryId":"hive_20161228094619_81b13647-4f7f-4f1b-9c08-0f64eb8dbb34",
    "name":"create table table2 as select * from table1",
    "startTime":"2016-12-28T09:46:19.003Z",
    "queryPlan":{
 
 
    },
    "operationType":"CREATETABLE_AS_SELECT",
    "outputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425516",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table2",
          "createTime":"2016-12-28T09:46:30.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425517",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table2@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425514",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table2.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425516",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:46:30.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425515",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table2",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table2@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425516",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482918390",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "endTime":"2016-12-28T09:46:31.211Z",
    "recentQueries":[
      "create table table2 as select * from table1"
    ],
    "inputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425520",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table1",
          "createTime":"2016-12-28T09:34:53.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425521",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table1@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425518",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table1.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425520",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:34:53.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425519",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table1",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table1@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425520",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482917693",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "qualifiedName":"default.table2@cl1:1482918390000",
    "queryText":"create table table2 as select * from table1",
    "clusterName":"cl1",
    "userName":"hive"
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step6: Repeat step2 with step5 json

Step7: You should be able to visualize the lineage between two entities.

10864-atlas-2016-12-28-16-47-13.png

The curl call will be same as the above.

15 REPLIES 15

Re: How to create hive table entity in Apache atlas using REST API?

@Manoj Dhake

Hive table entity can be created using /atlas/api/entites REST call.

One such example is:

Step1: JSON for creating table1:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425525",
    "version":0,
    "typeName":"hive_db",
    "state":"ACTIVE"
  },
  "typeName":"hive_db",
  "values":{
    "name":"default",
    "location":"hdfs://mycluster/apps/hive/warehouse",
    "description":"Default Hive database",
    "ownerType":2,
    "qualifiedName":"default@cl1",
    "owner":"public",
    "clusterName":"cl1",
    "parameters":{
 
 
    }
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
},{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425524",
    "version":0,
    "typeName":"hive_table",
    "state":"ACTIVE"
  },
  "typeName":"hive_table",
  "values":{
    "tableType":"MANAGED_TABLE",
    "name":"table1",
    "createTime":"2016-12-28T09:34:53.000Z",
    "temporary":false,
    "db":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425525",
        "version":0,
        "typeName":"hive_db",
        "state":"ACTIVE"
      },
      "typeName":"hive_db",
      "values":{
        "name":"default",
        "location":"hdfs://mycluster/apps/hive/warehouse",
        "description":"Default Hive database",
        "ownerType":2,
        "qualifiedName":"default@cl1",
        "owner":"public",
        "clusterName":"cl1",
        "parameters":{
 
 
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "retention":0,
    "qualifiedName":"default.table1@cl1",
    "columns":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425522",
          "version":0,
          "typeName":"hive_column",
          "state":"ACTIVE"
        },
        "typeName":"hive_column",
        "values":{
          "name":"abc",
          "qualifiedName":"default.table1.abc@cl1",
          "owner":"hive",
          "type":"string",
          "table":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
            "id":"-11893021824425524",
            "version":0,
            "typeName":"hive_table",
            "state":"ACTIVE"
          }
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "lastAccessTime":"2016-12-28T09:34:53.000Z",
    "owner":"hive",
    "sd":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425523",
        "version":0,
        "typeName":"hive_storagedesc",
        "state":"ACTIVE"
      },
      "typeName":"hive_storagedesc",
      "values":{
        "location":"hdfs://mycluster/apps/hive/warehouse/table1",
        "serdeInfo":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
          "typeName":"hive_serde",
          "values":{
            "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
            "parameters":{
              "serialization.format":"1"
            }
          }
        },
        "qualifiedName":"default.table1@cl1_storage",
        "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "compressed":false,
        "numBuckets":-1,
        "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
        "parameters":{
 
 
        },
        "storedAsSubDirectories":false,
        "table":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425524",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "parameters":{
      "rawDataSize":"0",
      "numFiles":"0",
      "transient_lastDdlTime":"1482917693",
      "totalSize":"0",
      "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
      "numRows":"0"
    },
    "partitionKeys":[
 
 
    ]
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step2: REST API call to create the hive table entity.

curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json;  charset=UTF-8' -u admin:admin -d @sample.json http://<IP_ADDRESS>:21000/api/atlas/entities

The above will help in creating a hive table entity.

Step3: JSON for creating table2:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425525",
    "version":0,
    "typeName":"hive_db",
    "state":"ACTIVE"
  },
  "typeName":"hive_db",
  "values":{
    "name":"default",
    "location":"hdfs://mycluster/apps/hive/warehouse",
    "description":"Default Hive database",
    "ownerType":2,
    "qualifiedName":"default@cl1",
    "owner":"public",
    "clusterName":"cl1",
    "parameters":{
 
 
    }
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
},{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425524",
    "version":0,
    "typeName":"hive_table",
    "state":"ACTIVE"
  },
  "typeName":"hive_table",
  "values":{
    "tableType":"MANAGED_TABLE",
    "name":"table2",
    "createTime":"2016-12-28T09:34:53.000Z",
    "temporary":false,
    "db":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425525",
        "version":0,
        "typeName":"hive_db",
        "state":"ACTIVE"
      },
      "typeName":"hive_db",
      "values":{
        "name":"default",
        "location":"hdfs://mycluster/apps/hive/warehouse",
        "description":"Default Hive database",
        "ownerType":2,
        "qualifiedName":"default@cl1",
        "owner":"public",
        "clusterName":"cl1",
        "parameters":{
 
 
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "retention":0,
    "qualifiedName":"default.table2@cl1",
    "columns":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425522",
          "version":0,
          "typeName":"hive_column",
          "state":"ACTIVE"
        },
        "typeName":"hive_column",
        "values":{
          "name":"abc",
          "qualifiedName":"default.table2.abc@cl1",
          "owner":"hive",
          "type":"string",
          "table":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
            "id":"-11893021824425524",
            "version":0,
            "typeName":"hive_table",
            "state":"ACTIVE"
          }
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "lastAccessTime":"2016-12-28T09:34:53.000Z",
    "owner":"hive",
    "sd":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425523",
        "version":0,
        "typeName":"hive_storagedesc",
        "state":"ACTIVE"
      },
      "typeName":"hive_storagedesc",
      "values":{
        "location":"hdfs://mycluster/apps/hive/warehouse/table2",
        "serdeInfo":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
          "typeName":"hive_serde",
          "values":{
            "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
            "parameters":{
              "serialization.format":"1"
            }
          }
        },
        "qualifiedName":"default.table2@cl1_storage",
        "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "compressed":false,
        "numBuckets":-1,
        "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
        "parameters":{
 
 
        },
        "storedAsSubDirectories":false,
        "table":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425524",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "parameters":{
      "rawDataSize":"0",
      "numFiles":"0",
      "transient_lastDdlTime":"1482917693",
      "totalSize":"0",
      "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
      "numRows":"0"
    },
    "partitionKeys":[
 
 
    ]
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step4: Repeat step2 with step3 json

Step5: JSON to create lineage between above two hive tables:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425513",
    "version":0,
    "typeName":"hive_process",
    "state":"ACTIVE"
  },
  "typeName":"hive_process",
  "values":{
    "queryId":"hive_20161228094619_81b13647-4f7f-4f1b-9c08-0f64eb8dbb34",
    "name":"create table table2 as select * from table1",
    "startTime":"2016-12-28T09:46:19.003Z",
    "queryPlan":{
 
 
    },
    "operationType":"CREATETABLE_AS_SELECT",
    "outputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425516",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table2",
          "createTime":"2016-12-28T09:46:30.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425517",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table2@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425514",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table2.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425516",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:46:30.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425515",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table2",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table2@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425516",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482918390",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "endTime":"2016-12-28T09:46:31.211Z",
    "recentQueries":[
      "create table table2 as select * from table1"
    ],
    "inputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425520",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table1",
          "createTime":"2016-12-28T09:34:53.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425521",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table1@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425518",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table1.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425520",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:34:53.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425519",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table1",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table1@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425520",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482917693",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "qualifiedName":"default.table2@cl1:1482918390000",
    "queryText":"create table table2 as select * from table1",
    "clusterName":"cl1",
    "userName":"hive"
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step6: Repeat step2 with step5 json

Step7: You should be able to visualize the lineage between two entities.

10864-atlas-2016-12-28-16-47-13.png

The curl call will be same as the above.

Re: How to create hive table entity in Apache atlas using REST API?

Rising Star

Thank you Ayub,

Is above json structure is only for creating hive table entity?

Consider my database is already created and now I just need to create hive table entity

Re: How to create hive table entity in Apache atlas using REST API?

Rising Star

Hi Ayub,

If we paste the above json data for creating hive entity in json validator there I am getting error as "multiple json root element".

Json Validator url:

https://jsonformatter.curiousconcept.com/

I think you have sent wrong json structure.

Re: How to create hive table entity in Apache atlas using REST API?

@Manoj Dhake I have updated the answer with more details, please check and let me know if it works.

This time I have validated the json structure :)

Re: How to create hive table entity in Apache atlas using REST API?

Rising Star

Thank you for reply Ayub,

I am trying to create entity using above json and within json I just have changed "mycluster" and "cl1" with my own cluster values but getting below error:

{"error":"For field 'tableName'","stackTrace":"org.apache.atlas.typesystem.types.ValueConversionException$NullConversionException: For field 'tableName'

Re: How to create hive table entity in Apache atlas using REST API?

Rising Star

Ok i was using hdp2.4 hdp sandbox ,so i will try this json on HDP 2.5

Re: How to create hive table entity in Apache atlas using REST API?

Rising Star

Thank you Ayub,

I checked your json on HDP 2.5 and it's working fine their.

Re: How to create hive table entity in Apache atlas using REST API?

Rising Star

Hi Ayub,

As we have created two dataset entities and set the lineage between them also,now my requirement is like ,

Consider I have already created hive table using hive query(i.e. patient_info_raw), it's metadata is also present in atlas repository and now I want to create lineage between this existing dataset and the one which I will create by using POST api (i.e. patient_validated_info).

so what changes I need to make in json file of lineage data (i.e. in 3rd step)? so that I can see the lineage

I can create third table(i.e. hive_entity) by using same json file that is fine but what about json data for lineage?

How can I link them from patient_info_raw--->patient_validated_info.

Re: How to create hive table entity in Apache atlas using REST API?

Rising Star

Hi Ayub,

As we have created two dataset entities and set the lineage between them also.

Consider I have already created hive table(i.e .patient_raw_info) and it's metadata is also present in atlas and now I want to create lineage between already exist dataset(i.e. patient_raw_info) and the one which I will going to create by using your REST API (i.e. patient_validated_dataset) so my question is

How can I create hive_process between already exist dataset and the other one?

what changes I need to make in json file which we are using to create hive_process (i.e. lineage) ?

I can create third table(i.e. hive_entity) by using same json file that is fine but what about json data for lineage?

How can I link them from,

patient_raw_info--->patient_validated_dataset