Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

How to create hive table entity in Apache atlas using REST API?

avatar
Expert Contributor

Hi All,

I searched everywhere on internet but I don't find anywhere example to create "hive table" entity using REST API.Here the problem is that,I am very much confused on creating json body for REST api call.

Please send complete REST API call example with curl and json body to create hive table entity?

and also please send example to create lineage link between two datasets in Apache atlas?

1 ACCEPTED SOLUTION

avatar

@Manoj Dhake

Hive table entity can be created using /atlas/api/entites REST call.

One such example is:

Step1: JSON for creating table1:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425525",
    "version":0,
    "typeName":"hive_db",
    "state":"ACTIVE"
  },
  "typeName":"hive_db",
  "values":{
    "name":"default",
    "location":"hdfs://mycluster/apps/hive/warehouse",
    "description":"Default Hive database",
    "ownerType":2,
    "qualifiedName":"default@cl1",
    "owner":"public",
    "clusterName":"cl1",
    "parameters":{
 
 
    }
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
},{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425524",
    "version":0,
    "typeName":"hive_table",
    "state":"ACTIVE"
  },
  "typeName":"hive_table",
  "values":{
    "tableType":"MANAGED_TABLE",
    "name":"table1",
    "createTime":"2016-12-28T09:34:53.000Z",
    "temporary":false,
    "db":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425525",
        "version":0,
        "typeName":"hive_db",
        "state":"ACTIVE"
      },
      "typeName":"hive_db",
      "values":{
        "name":"default",
        "location":"hdfs://mycluster/apps/hive/warehouse",
        "description":"Default Hive database",
        "ownerType":2,
        "qualifiedName":"default@cl1",
        "owner":"public",
        "clusterName":"cl1",
        "parameters":{
 
 
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "retention":0,
    "qualifiedName":"default.table1@cl1",
    "columns":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425522",
          "version":0,
          "typeName":"hive_column",
          "state":"ACTIVE"
        },
        "typeName":"hive_column",
        "values":{
          "name":"abc",
          "qualifiedName":"default.table1.abc@cl1",
          "owner":"hive",
          "type":"string",
          "table":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
            "id":"-11893021824425524",
            "version":0,
            "typeName":"hive_table",
            "state":"ACTIVE"
          }
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "lastAccessTime":"2016-12-28T09:34:53.000Z",
    "owner":"hive",
    "sd":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425523",
        "version":0,
        "typeName":"hive_storagedesc",
        "state":"ACTIVE"
      },
      "typeName":"hive_storagedesc",
      "values":{
        "location":"hdfs://mycluster/apps/hive/warehouse/table1",
        "serdeInfo":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
          "typeName":"hive_serde",
          "values":{
            "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
            "parameters":{
              "serialization.format":"1"
            }
          }
        },
        "qualifiedName":"default.table1@cl1_storage",
        "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "compressed":false,
        "numBuckets":-1,
        "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
        "parameters":{
 
 
        },
        "storedAsSubDirectories":false,
        "table":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425524",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "parameters":{
      "rawDataSize":"0",
      "numFiles":"0",
      "transient_lastDdlTime":"1482917693",
      "totalSize":"0",
      "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
      "numRows":"0"
    },
    "partitionKeys":[
 
 
    ]
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step2: REST API call to create the hive table entity.

curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json;  charset=UTF-8' -u admin:admin -d @sample.json http://<IP_ADDRESS>:21000/api/atlas/entities

The above will help in creating a hive table entity.

Step3: JSON for creating table2:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425525",
    "version":0,
    "typeName":"hive_db",
    "state":"ACTIVE"
  },
  "typeName":"hive_db",
  "values":{
    "name":"default",
    "location":"hdfs://mycluster/apps/hive/warehouse",
    "description":"Default Hive database",
    "ownerType":2,
    "qualifiedName":"default@cl1",
    "owner":"public",
    "clusterName":"cl1",
    "parameters":{
 
 
    }
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
},{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425524",
    "version":0,
    "typeName":"hive_table",
    "state":"ACTIVE"
  },
  "typeName":"hive_table",
  "values":{
    "tableType":"MANAGED_TABLE",
    "name":"table2",
    "createTime":"2016-12-28T09:34:53.000Z",
    "temporary":false,
    "db":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425525",
        "version":0,
        "typeName":"hive_db",
        "state":"ACTIVE"
      },
      "typeName":"hive_db",
      "values":{
        "name":"default",
        "location":"hdfs://mycluster/apps/hive/warehouse",
        "description":"Default Hive database",
        "ownerType":2,
        "qualifiedName":"default@cl1",
        "owner":"public",
        "clusterName":"cl1",
        "parameters":{
 
 
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "retention":0,
    "qualifiedName":"default.table2@cl1",
    "columns":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425522",
          "version":0,
          "typeName":"hive_column",
          "state":"ACTIVE"
        },
        "typeName":"hive_column",
        "values":{
          "name":"abc",
          "qualifiedName":"default.table2.abc@cl1",
          "owner":"hive",
          "type":"string",
          "table":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
            "id":"-11893021824425524",
            "version":0,
            "typeName":"hive_table",
            "state":"ACTIVE"
          }
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "lastAccessTime":"2016-12-28T09:34:53.000Z",
    "owner":"hive",
    "sd":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425523",
        "version":0,
        "typeName":"hive_storagedesc",
        "state":"ACTIVE"
      },
      "typeName":"hive_storagedesc",
      "values":{
        "location":"hdfs://mycluster/apps/hive/warehouse/table2",
        "serdeInfo":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
          "typeName":"hive_serde",
          "values":{
            "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
            "parameters":{
              "serialization.format":"1"
            }
          }
        },
        "qualifiedName":"default.table2@cl1_storage",
        "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "compressed":false,
        "numBuckets":-1,
        "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
        "parameters":{
 
 
        },
        "storedAsSubDirectories":false,
        "table":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425524",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "parameters":{
      "rawDataSize":"0",
      "numFiles":"0",
      "transient_lastDdlTime":"1482917693",
      "totalSize":"0",
      "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
      "numRows":"0"
    },
    "partitionKeys":[
 
 
    ]
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step4: Repeat step2 with step3 json

Step5: JSON to create lineage between above two hive tables:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425513",
    "version":0,
    "typeName":"hive_process",
    "state":"ACTIVE"
  },
  "typeName":"hive_process",
  "values":{
    "queryId":"hive_20161228094619_81b13647-4f7f-4f1b-9c08-0f64eb8dbb34",
    "name":"create table table2 as select * from table1",
    "startTime":"2016-12-28T09:46:19.003Z",
    "queryPlan":{
 
 
    },
    "operationType":"CREATETABLE_AS_SELECT",
    "outputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425516",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table2",
          "createTime":"2016-12-28T09:46:30.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425517",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table2@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425514",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table2.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425516",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:46:30.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425515",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table2",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table2@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425516",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482918390",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "endTime":"2016-12-28T09:46:31.211Z",
    "recentQueries":[
      "create table table2 as select * from table1"
    ],
    "inputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425520",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table1",
          "createTime":"2016-12-28T09:34:53.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425521",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table1@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425518",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table1.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425520",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:34:53.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425519",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table1",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table1@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425520",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482917693",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "qualifiedName":"default.table2@cl1:1482918390000",
    "queryText":"create table table2 as select * from table1",
    "clusterName":"cl1",
    "userName":"hive"
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step6: Repeat step2 with step5 json

Step7: You should be able to visualize the lineage between two entities.

10864-atlas-2016-12-28-16-47-13.png

The curl call will be same as the above.

View solution in original post

15 REPLIES 15

avatar

@Manoj Dhake

Hive table entity can be created using /atlas/api/entites REST call.

One such example is:

Step1: JSON for creating table1:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425525",
    "version":0,
    "typeName":"hive_db",
    "state":"ACTIVE"
  },
  "typeName":"hive_db",
  "values":{
    "name":"default",
    "location":"hdfs://mycluster/apps/hive/warehouse",
    "description":"Default Hive database",
    "ownerType":2,
    "qualifiedName":"default@cl1",
    "owner":"public",
    "clusterName":"cl1",
    "parameters":{
 
 
    }
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
},{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425524",
    "version":0,
    "typeName":"hive_table",
    "state":"ACTIVE"
  },
  "typeName":"hive_table",
  "values":{
    "tableType":"MANAGED_TABLE",
    "name":"table1",
    "createTime":"2016-12-28T09:34:53.000Z",
    "temporary":false,
    "db":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425525",
        "version":0,
        "typeName":"hive_db",
        "state":"ACTIVE"
      },
      "typeName":"hive_db",
      "values":{
        "name":"default",
        "location":"hdfs://mycluster/apps/hive/warehouse",
        "description":"Default Hive database",
        "ownerType":2,
        "qualifiedName":"default@cl1",
        "owner":"public",
        "clusterName":"cl1",
        "parameters":{
 
 
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "retention":0,
    "qualifiedName":"default.table1@cl1",
    "columns":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425522",
          "version":0,
          "typeName":"hive_column",
          "state":"ACTIVE"
        },
        "typeName":"hive_column",
        "values":{
          "name":"abc",
          "qualifiedName":"default.table1.abc@cl1",
          "owner":"hive",
          "type":"string",
          "table":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
            "id":"-11893021824425524",
            "version":0,
            "typeName":"hive_table",
            "state":"ACTIVE"
          }
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "lastAccessTime":"2016-12-28T09:34:53.000Z",
    "owner":"hive",
    "sd":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425523",
        "version":0,
        "typeName":"hive_storagedesc",
        "state":"ACTIVE"
      },
      "typeName":"hive_storagedesc",
      "values":{
        "location":"hdfs://mycluster/apps/hive/warehouse/table1",
        "serdeInfo":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
          "typeName":"hive_serde",
          "values":{
            "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
            "parameters":{
              "serialization.format":"1"
            }
          }
        },
        "qualifiedName":"default.table1@cl1_storage",
        "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "compressed":false,
        "numBuckets":-1,
        "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
        "parameters":{
 
 
        },
        "storedAsSubDirectories":false,
        "table":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425524",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "parameters":{
      "rawDataSize":"0",
      "numFiles":"0",
      "transient_lastDdlTime":"1482917693",
      "totalSize":"0",
      "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
      "numRows":"0"
    },
    "partitionKeys":[
 
 
    ]
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step2: REST API call to create the hive table entity.

curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json;  charset=UTF-8' -u admin:admin -d @sample.json http://<IP_ADDRESS>:21000/api/atlas/entities

The above will help in creating a hive table entity.

Step3: JSON for creating table2:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425525",
    "version":0,
    "typeName":"hive_db",
    "state":"ACTIVE"
  },
  "typeName":"hive_db",
  "values":{
    "name":"default",
    "location":"hdfs://mycluster/apps/hive/warehouse",
    "description":"Default Hive database",
    "ownerType":2,
    "qualifiedName":"default@cl1",
    "owner":"public",
    "clusterName":"cl1",
    "parameters":{
 
 
    }
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
},{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425524",
    "version":0,
    "typeName":"hive_table",
    "state":"ACTIVE"
  },
  "typeName":"hive_table",
  "values":{
    "tableType":"MANAGED_TABLE",
    "name":"table2",
    "createTime":"2016-12-28T09:34:53.000Z",
    "temporary":false,
    "db":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425525",
        "version":0,
        "typeName":"hive_db",
        "state":"ACTIVE"
      },
      "typeName":"hive_db",
      "values":{
        "name":"default",
        "location":"hdfs://mycluster/apps/hive/warehouse",
        "description":"Default Hive database",
        "ownerType":2,
        "qualifiedName":"default@cl1",
        "owner":"public",
        "clusterName":"cl1",
        "parameters":{
 
 
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "retention":0,
    "qualifiedName":"default.table2@cl1",
    "columns":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425522",
          "version":0,
          "typeName":"hive_column",
          "state":"ACTIVE"
        },
        "typeName":"hive_column",
        "values":{
          "name":"abc",
          "qualifiedName":"default.table2.abc@cl1",
          "owner":"hive",
          "type":"string",
          "table":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
            "id":"-11893021824425524",
            "version":0,
            "typeName":"hive_table",
            "state":"ACTIVE"
          }
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "lastAccessTime":"2016-12-28T09:34:53.000Z",
    "owner":"hive",
    "sd":{
      "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
      "id":{
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
        "id":"-11893021824425523",
        "version":0,
        "typeName":"hive_storagedesc",
        "state":"ACTIVE"
      },
      "typeName":"hive_storagedesc",
      "values":{
        "location":"hdfs://mycluster/apps/hive/warehouse/table2",
        "serdeInfo":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
          "typeName":"hive_serde",
          "values":{
            "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
            "parameters":{
              "serialization.format":"1"
            }
          }
        },
        "qualifiedName":"default.table2@cl1_storage",
        "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
        "compressed":false,
        "numBuckets":-1,
        "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
        "parameters":{
 
 
        },
        "storedAsSubDirectories":false,
        "table":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425524",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        }
      },
      "traitNames":[
 
 
      ],
      "traits":{
 
 
      }
    },
    "parameters":{
      "rawDataSize":"0",
      "numFiles":"0",
      "transient_lastDdlTime":"1482917693",
      "totalSize":"0",
      "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
      "numRows":"0"
    },
    "partitionKeys":[
 
 
    ]
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step4: Repeat step2 with step3 json

Step5: JSON to create lineage between above two hive tables:

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425513",
    "version":0,
    "typeName":"hive_process",
    "state":"ACTIVE"
  },
  "typeName":"hive_process",
  "values":{
    "queryId":"hive_20161228094619_81b13647-4f7f-4f1b-9c08-0f64eb8dbb34",
    "name":"create table table2 as select * from table1",
    "startTime":"2016-12-28T09:46:19.003Z",
    "queryPlan":{
 
 
    },
    "operationType":"CREATETABLE_AS_SELECT",
    "outputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425516",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table2",
          "createTime":"2016-12-28T09:46:30.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425517",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table2@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425514",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table2.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425516",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:46:30.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425515",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table2",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table2@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425516",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482918390",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "endTime":"2016-12-28T09:46:31.211Z",
    "recentQueries":[
      "create table table2 as select * from table1"
    ],
    "inputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425520",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table1",
          "createTime":"2016-12-28T09:34:53.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425521",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table1@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425518",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table1.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425520",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:34:53.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425519",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table1",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table1@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425520",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482917693",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "qualifiedName":"default.table2@cl1:1482918390000",
    "queryText":"create table table2 as select * from table1",
    "clusterName":"cl1",
    "userName":"hive"
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

Save the above json to a file.

Step6: Repeat step2 with step5 json

Step7: You should be able to visualize the lineage between two entities.

10864-atlas-2016-12-28-16-47-13.png

The curl call will be same as the above.

avatar
Expert Contributor

Thank you Ayub,

Is above json structure is only for creating hive table entity?

Consider my database is already created and now I just need to create hive table entity

avatar
Expert Contributor

Hi Ayub,

If we paste the above json data for creating hive entity in json validator there I am getting error as "multiple json root element".

Json Validator url:

https://jsonformatter.curiousconcept.com/

I think you have sent wrong json structure.

avatar

@Manoj Dhake I have updated the answer with more details, please check and let me know if it works.

This time I have validated the json structure 🙂

avatar
Expert Contributor

Thank you for reply Ayub,

I am trying to create entity using above json and within json I just have changed "mycluster" and "cl1" with my own cluster values but getting below error:

{"error":"For field 'tableName'","stackTrace":"org.apache.atlas.typesystem.types.ValueConversionException$NullConversionException: For field 'tableName'

avatar
Expert Contributor

Ok i was using hdp2.4 hdp sandbox ,so i will try this json on HDP 2.5

avatar
Expert Contributor

Thank you Ayub,

I checked your json on HDP 2.5 and it's working fine their.

avatar
Expert Contributor

Hi Ayub,

As we have created two dataset entities and set the lineage between them also,now my requirement is like ,

Consider I have already created hive table using hive query(i.e. patient_info_raw), it's metadata is also present in atlas repository and now I want to create lineage between this existing dataset and the one which I will create by using POST api (i.e. patient_validated_info).

so what changes I need to make in json file of lineage data (i.e. in 3rd step)? so that I can see the lineage

I can create third table(i.e. hive_entity) by using same json file that is fine but what about json data for lineage?

How can I link them from patient_info_raw--->patient_validated_info.

avatar
Expert Contributor

Hi Ayub,

As we have created two dataset entities and set the lineage between them also.

Consider I have already created hive table(i.e .patient_raw_info) and it's metadata is also present in atlas and now I want to create lineage between already exist dataset(i.e. patient_raw_info) and the one which I will going to create by using your REST API (i.e. patient_validated_dataset) so my question is

How can I create hive_process between already exist dataset and the other one?

what changes I need to make in json file which we are using to create hive_process (i.e. lineage) ?

I can create third table(i.e. hive_entity) by using same json file that is fine but what about json data for lineage?

How can I link them from,

patient_raw_info--->patient_validated_dataset

avatar
Expert Contributor

Hi Ayub,

I am able to set lineage between table1 and table2 successfully but now my requirement like,

Consider,I already have created hive table using hive query, it's metadata is also present in altas and I want to link or create lineage between this already created table and the one which i will going to create using REST API,to do this

what changes I need to make in json file which we are using to create hive_process?

which one is that property, you have set in json file because of it we can link table1 and table2?

avatar

@Manoj Dhake Which HDP version are you using? This JSON would work with HDP-2.5.x release.

avatar

@Manoj Dhake Currently the hive process json links table1 and table2. For creating lineage between table2 and table3: in the json change table1 references to table2 and table2 references to table3 and submit the json.

This should create lineage like table1-->table2-->table3

avatar

As I was seeing frequent questions on REST API usage to create entity and lineage I have posted it as an HCC article.

https://community.hortonworks.com/content/kbentry/74919/how-to-create-hive-table-and-lineage-using-r...

avatar
Super Guru

I have working atlas api examples here

https://github.com/sunileman/Atlas-API-Examples

avatar
New Contributor

Please first validate your JSON using JSON Formatter and JSON Validator.

Labels