Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to create hive table entity in Apache atlas using REST API?

avatar
Super Collaborator

Hi All,

I searched everywhere on internet but I don't find anywhere example to create "hive table" entity using REST API.Here the problem is that,I am very much confused on creating json body for REST api call.

Please send complete REST API call example with curl and json body to create hive table entity?

and also please send example to create lineage link between two datasets in Apache atlas?

1 ACCEPTED SOLUTION

avatar

@Manoj Dhake

Hive table entity can be created using /atlas/api/entites REST call.

One such example is:

Step1: JSON for creating table1:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425525",
    "version":0,
    "typeName":"hive_db",
    "state":"ACTIVE"
  },
  "typeName":"hive_db",
  "values":{
    "name":"default",
    "location":"hdfs://mycluster/apps/hive/warehouse",
    "description":"Default Hive database",
    "ownerType":2,
    "qualifiedName":"default@cl1",
    "owner":"public",
    "clusterName":"cl1",
    "parameters":{
 
 
    }
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
},{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425524",
    "version":0,
    "typeName":"hive_table",
    "state":"ACTIVE"
  },
  "typeName":"hive_table",
  "values":{
    "tableType":"MANAGED_TABLE",
    "name":"table1",
    "createTime":"2016-12-28T09:34:53.000Z",
    "temporary":false,
    "db":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425525",
        "version":0,
        "typeName":"hive_db",
        "state":"ACTIVE"
      },
      "typeName":"hive_db",
      "values":{
        "name":"default",
        "location":"hdfs://mycluster/apps/hive/warehouse",
        "description":"Default Hive database",
        "ownerType":2,
        "qualifiedName":"default@cl1",
        "owner":"public",
        "clusterName":"cl1",
        "parameters":{
 
 
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "retention":0,
    "qualifiedName":"default.table1@cl1",
    "columns":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425522",
          "version":0,
          "typeName":"hive_column",
          "state":"ACTIVE"
        },
        "typeName":"hive_column",
        "values":{
          "name":"abc",
          "qualifiedName":"default.table1.abc@cl1",
          "owner":"hive",
          "type":"string",
          "table":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
            "id":"-11893021824425524",
            "version":0,
            "typeName":"hive_table",
            "state":"ACTIVE"
          }
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "lastAccessTime":"2016-12-28T09:34:53.000Z",
    "owner":"hive",
    "sd":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425523",
        "version":0,
        "typeName":"hive_storagedesc",
        "state":"ACTIVE"
      },
      "typeName":"hive_storagedesc",
      "values":{
        "location":"hdfs://mycluster/apps/hive/warehouse/table1",
        "serdeInfo":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
          "typeName":"hive_serde",
          "values":{
            "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
            "parameters":{
              "serialization.format":"1"
            }
          }
        },
        "qualifiedName":"default.table1@cl1_storage",
        "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "compressed":false,
        "numBuckets":-1,
        "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
        "parameters":{
 
 
        },
        "storedAsSubDirectories":false,
        "table":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425524",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "parameters":{
      "rawDataSize":"0",
      "numFiles":"0",
      "transient_lastDdlTime":"1482917693",
      "totalSize":"0",
      "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
      "numRows":"0"
    },
    "partitionKeys":[
 
 
    ]
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step2: REST API call to create the hive table entity.

curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json;  charset=UTF-8' -u admin:admin -d @sample.json http://<IP_ADDRESS>:21000/api/atlas/entities

The above will help in creating a hive table entity.

Step3: JSON for creating table2:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425525",
    "version":0,
    "typeName":"hive_db",
    "state":"ACTIVE"
  },
  "typeName":"hive_db",
  "values":{
    "name":"default",
    "location":"hdfs://mycluster/apps/hive/warehouse",
    "description":"Default Hive database",
    "ownerType":2,
    "qualifiedName":"default@cl1",
    "owner":"public",
    "clusterName":"cl1",
    "parameters":{
 
 
    }
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
},{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425524",
    "version":0,
    "typeName":"hive_table",
    "state":"ACTIVE"
  },
  "typeName":"hive_table",
  "values":{
    "tableType":"MANAGED_TABLE",
    "name":"table2",
    "createTime":"2016-12-28T09:34:53.000Z",
    "temporary":false,
    "db":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425525",
        "version":0,
        "typeName":"hive_db",
        "state":"ACTIVE"
      },
      "typeName":"hive_db",
      "values":{
        "name":"default",
        "location":"hdfs://mycluster/apps/hive/warehouse",
        "description":"Default Hive database",
        "ownerType":2,
        "qualifiedName":"default@cl1",
        "owner":"public",
        "clusterName":"cl1",
        "parameters":{
 
 
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "retention":0,
    "qualifiedName":"default.table2@cl1",
    "columns":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425522",
          "version":0,
          "typeName":"hive_column",
          "state":"ACTIVE"
        },
        "typeName":"hive_column",
        "values":{
          "name":"abc",
          "qualifiedName":"default.table2.abc@cl1",
          "owner":"hive",
          "type":"string",
          "table":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
            "id":"-11893021824425524",
            "version":0,
            "typeName":"hive_table",
            "state":"ACTIVE"
          }
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "lastAccessTime":"2016-12-28T09:34:53.000Z",
    "owner":"hive",
    "sd":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425523",
        "version":0,
        "typeName":"hive_storagedesc",
        "state":"ACTIVE"
      },
      "typeName":"hive_storagedesc",
      "values":{
        "location":"hdfs://mycluster/apps/hive/warehouse/table2",
        "serdeInfo":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
          "typeName":"hive_serde",
          "values":{
            "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
            "parameters":{
              "serialization.format":"1"
            }
          }
        },
        "qualifiedName":"default.table2@cl1_storage",
        "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "compressed":false,
        "numBuckets":-1,
        "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
        "parameters":{
 
 
        },
        "storedAsSubDirectories":false,
        "table":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425524",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "parameters":{
      "rawDataSize":"0",
      "numFiles":"0",
      "transient_lastDdlTime":"1482917693",
      "totalSize":"0",
      "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
      "numRows":"0"
    },
    "partitionKeys":[
 
 
    ]
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step4: Repeat step2 with step3 json

Step5: JSON to create lineage between above two hive tables:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425513",
    "version":0,
    "typeName":"hive_process",
    "state":"ACTIVE"
  },
  "typeName":"hive_process",
  "values":{
    "queryId":"hive_20161228094619_81b13647-4f7f-4f1b-9c08-0f64eb8dbb34",
    "name":"create table table2 as select * from table1",
    "startTime":"2016-12-28T09:46:19.003Z",
    "queryPlan":{
 
 
    },
    "operationType":"CREATETABLE_AS_SELECT",
    "outputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425516",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table2",
          "createTime":"2016-12-28T09:46:30.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425517",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table2@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425514",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table2.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425516",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:46:30.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425515",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table2",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table2@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425516",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482918390",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "endTime":"2016-12-28T09:46:31.211Z",
    "recentQueries":[
      "create table table2 as select * from table1"
    ],
    "inputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425520",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table1",
          "createTime":"2016-12-28T09:34:53.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425521",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table1@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425518",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table1.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425520",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:34:53.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425519",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table1",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table1@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425520",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482917693",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "qualifiedName":"default.table2@cl1:1482918390000",
    "queryText":"create table table2 as select * from table1",
    "clusterName":"cl1",
    "userName":"hive"
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step6: Repeat step2 with step5 json

Step7: You should be able to visualize the lineage between two entities.

10864-atlas-2016-12-28-16-47-13.png

The curl call will be same as the above.

View solution in original post

15 REPLIES 15

avatar

@Manoj Dhake

Hive table entity can be created using /atlas/api/entites REST call.

One such example is:

Step1: JSON for creating table1:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425525",
    "version":0,
    "typeName":"hive_db",
    "state":"ACTIVE"
  },
  "typeName":"hive_db",
  "values":{
    "name":"default",
    "location":"hdfs://mycluster/apps/hive/warehouse",
    "description":"Default Hive database",
    "ownerType":2,
    "qualifiedName":"default@cl1",
    "owner":"public",
    "clusterName":"cl1",
    "parameters":{
 
 
    }
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
},{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425524",
    "version":0,
    "typeName":"hive_table",
    "state":"ACTIVE"
  },
  "typeName":"hive_table",
  "values":{
    "tableType":"MANAGED_TABLE",
    "name":"table1",
    "createTime":"2016-12-28T09:34:53.000Z",
    "temporary":false,
    "db":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425525",
        "version":0,
        "typeName":"hive_db",
        "state":"ACTIVE"
      },
      "typeName":"hive_db",
      "values":{
        "name":"default",
        "location":"hdfs://mycluster/apps/hive/warehouse",
        "description":"Default Hive database",
        "ownerType":2,
        "qualifiedName":"default@cl1",
        "owner":"public",
        "clusterName":"cl1",
        "parameters":{
 
 
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "retention":0,
    "qualifiedName":"default.table1@cl1",
    "columns":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425522",
          "version":0,
          "typeName":"hive_column",
          "state":"ACTIVE"
        },
        "typeName":"hive_column",
        "values":{
          "name":"abc",
          "qualifiedName":"default.table1.abc@cl1",
          "owner":"hive",
          "type":"string",
          "table":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
            "id":"-11893021824425524",
            "version":0,
            "typeName":"hive_table",
            "state":"ACTIVE"
          }
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "lastAccessTime":"2016-12-28T09:34:53.000Z",
    "owner":"hive",
    "sd":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425523",
        "version":0,
        "typeName":"hive_storagedesc",
        "state":"ACTIVE"
      },
      "typeName":"hive_storagedesc",
      "values":{
        "location":"hdfs://mycluster/apps/hive/warehouse/table1",
        "serdeInfo":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
          "typeName":"hive_serde",
          "values":{
            "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
            "parameters":{
              "serialization.format":"1"
            }
          }
        },
        "qualifiedName":"default.table1@cl1_storage",
        "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "compressed":false,
        "numBuckets":-1,
        "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
        "parameters":{
 
 
        },
        "storedAsSubDirectories":false,
        "table":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425524",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "parameters":{
      "rawDataSize":"0",
      "numFiles":"0",
      "transient_lastDdlTime":"1482917693",
      "totalSize":"0",
      "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
      "numRows":"0"
    },
    "partitionKeys":[
 
 
    ]
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step2: REST API call to create the hive table entity.

curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json;  charset=UTF-8' -u admin:admin -d @sample.json http://<IP_ADDRESS>:21000/api/atlas/entities

The above will help in creating a hive table entity.

Step3: JSON for creating table2:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425525",
    "version":0,
    "typeName":"hive_db",
    "state":"ACTIVE"
  },
  "typeName":"hive_db",
  "values":{
    "name":"default",
    "location":"hdfs://mycluster/apps/hive/warehouse",
    "description":"Default Hive database",
    "ownerType":2,
    "qualifiedName":"default@cl1",
    "owner":"public",
    "clusterName":"cl1",
    "parameters":{
 
 
    }
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
},{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425524",
    "version":0,
    "typeName":"hive_table",
    "state":"ACTIVE"
  },
  "typeName":"hive_table",
  "values":{
    "tableType":"MANAGED_TABLE",
    "name":"table2",
    "createTime":"2016-12-28T09:34:53.000Z",
    "temporary":false,
    "db":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425525",
        "version":0,
        "typeName":"hive_db",
        "state":"ACTIVE"
      },
      "typeName":"hive_db",
      "values":{
        "name":"default",
        "location":"hdfs://mycluster/apps/hive/warehouse",
        "description":"Default Hive database",
        "ownerType":2,
        "qualifiedName":"default@cl1",
        "owner":"public",
        "clusterName":"cl1",
        "parameters":{
 
 
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "retention":0,
    "qualifiedName":"default.table2@cl1",
    "columns":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425522",
          "version":0,
          "typeName":"hive_column",
          "state":"ACTIVE"
        },
        "typeName":"hive_column",
        "values":{
          "name":"abc",
          "qualifiedName":"default.table2.abc@cl1",
          "owner":"hive",
          "type":"string",
          "table":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
            "id":"-11893021824425524",
            "version":0,
            "typeName":"hive_table",
            "state":"ACTIVE"
          }
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "lastAccessTime":"2016-12-28T09:34:53.000Z",
    "owner":"hive",
    "sd":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425523",
        "version":0,
        "typeName":"hive_storagedesc",
        "state":"ACTIVE"
      },
      "typeName":"hive_storagedesc",
      "values":{
        "location":"hdfs://mycluster/apps/hive/warehouse/table2",
        "serdeInfo":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
          "typeName":"hive_serde",
          "values":{
            "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
            "parameters":{
              "serialization.format":"1"
            }
          }
        },
        "qualifiedName":"default.table2@cl1_storage",
        "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "compressed":false,
        "numBuckets":-1,
        "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
        "parameters":{
 
 
        },
        "storedAsSubDirectories":false,
        "table":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425524",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "parameters":{
      "rawDataSize":"0",
      "numFiles":"0",
      "transient_lastDdlTime":"1482917693",
      "totalSize":"0",
      "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
      "numRows":"0"
    },
    "partitionKeys":[
 
 
    ]
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step4: Repeat step2 with step3 json

Step5: JSON to create lineage between above two hive tables:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425513",
    "version":0,
    "typeName":"hive_process",
    "state":"ACTIVE"
  },
  "typeName":"hive_process",
  "values":{
    "queryId":"hive_20161228094619_81b13647-4f7f-4f1b-9c08-0f64eb8dbb34",
    "name":"create table table2 as select * from table1",
    "startTime":"2016-12-28T09:46:19.003Z",
    "queryPlan":{
 
 
    },
    "operationType":"CREATETABLE_AS_SELECT",
    "outputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425516",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table2",
          "createTime":"2016-12-28T09:46:30.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425517",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table2@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425514",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table2.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425516",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:46:30.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425515",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table2",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table2@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425516",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482918390",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "endTime":"2016-12-28T09:46:31.211Z",
    "recentQueries":[
      "create table table2 as select * from table1"
    ],
    "inputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425520",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table1",
          "createTime":"2016-12-28T09:34:53.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425521",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table1@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425518",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table1.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425520",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:34:53.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425519",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table1",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table1@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425520",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482917693",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "qualifiedName":"default.table2@cl1:1482918390000",
    "queryText":"create table table2 as select * from table1",
    "clusterName":"cl1",
    "userName":"hive"
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step6: Repeat step2 with step5 json

Step7: You should be able to visualize the lineage between two entities.

10864-atlas-2016-12-28-16-47-13.png

The curl call will be same as the above.

avatar
Super Collaborator

Thank you Ayub,

Is above json structure is only for creating hive table entity?

Consider my database is already created and now I just need to create hive table entity

avatar
Super Collaborator

Hi Ayub,

If we paste the above json data for creating hive entity in json validator there I am getting error as "multiple json root element".

Json Validator url:

https://jsonformatter.curiousconcept.com/

I think you have sent wrong json structure.

avatar

@Manoj Dhake I have updated the answer with more details, please check and let me know if it works.

This time I have validated the json structure 🙂

avatar
Super Collaborator

Thank you for reply Ayub,

I am trying to create entity using above json and within json I just have changed "mycluster" and "cl1" with my own cluster values but getting below error:

{"error":"For field 'tableName'","stackTrace":"org.apache.atlas.typesystem.types.ValueConversionException$NullConversionException: For field 'tableName'

avatar
Super Collaborator

Ok i was using hdp2.4 hdp sandbox ,so i will try this json on HDP 2.5

avatar
Super Collaborator

Thank you Ayub,

I checked your json on HDP 2.5 and it's working fine their.

avatar
Super Collaborator

Hi Ayub,

As we have created two dataset entities and set the lineage between them also,now my requirement is like ,

Consider I have already created hive table using hive query(i.e. patient_info_raw), it's metadata is also present in atlas repository and now I want to create lineage between this existing dataset and the one which I will create by using POST api (i.e. patient_validated_info).

so what changes I need to make in json file of lineage data (i.e. in 3rd step)? so that I can see the lineage

I can create third table(i.e. hive_entity) by using same json file that is fine but what about json data for lineage?

How can I link them from patient_info_raw--->patient_validated_info.

avatar
Super Collaborator

Hi Ayub,

As we have created two dataset entities and set the lineage between them also.

Consider I have already created hive table(i.e .patient_raw_info) and it's metadata is also present in atlas and now I want to create lineage between already exist dataset(i.e. patient_raw_info) and the one which I will going to create by using your REST API (i.e. patient_validated_dataset) so my question is

How can I create hive_process between already exist dataset and the other one?

what changes I need to make in json file which we are using to create hive_process (i.e. lineage) ?

I can create third table(i.e. hive_entity) by using same json file that is fine but what about json data for lineage?

How can I link them from,

patient_raw_info--->patient_validated_dataset