Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Issue about Atlas lineage?

avatar
Super Collaborator

Hi Guys,

I able able to create lineage(i.e hive_process) between two dataset in apache atlas,i have referred below link to complete this task

Link:

https://community.hortonworks.com/questions/74875/how-to-create-hive-table-entity-in-apache-atlas-us...

I am able to set lineage between table1 and table2 successfully but now my requirement like,

Consider,I already have created hive table using hive query, it's metadata is also present in altas and I want to link or create lineage between this already created table and the one which i will going to create using REST API,to do this

what changes I need to make in json file which we are using to create hive_process?

which one is that property, you have set in json file because of it we can link table1 and table2?

1 ACCEPTED SOLUTION

avatar

@Manoj Dhake

As an extension to what was answered here, just create another table named table3 and submit the below json using /api/atlas/entities REST API.

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425513",
    "version":0,
    "typeName":"hive_process",
    "state":"ACTIVE"
  },
  "typeName":"hive_process",
  "values":{
    "queryId":"hive_20161228094619_81b13647-4f7f-4f1b-9c08-0f64eb8dbb34",
    "name":"create table table3 as select * from table2",
    "startTime":"2016-12-28T09:46:19.003Z",
    "queryPlan":{
 
 
    },
    "operationType":"CREATETABLE_AS_SELECT",
    "outputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425516",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table3",
          "createTime":"2016-12-28T09:46:30.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425517",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table3@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425514",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table3.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425516",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:46:30.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425515",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table3",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table3@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425516",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482918390",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "endTime":"2016-12-28T09:46:31.211Z",
    "recentQueries":[
      "create table table3 as select * from table2"
    ],
    "inputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425520",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table2",
          "createTime":"2016-12-28T09:34:53.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425521",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table2@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425518",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table2.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425520",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:34:53.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425519",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table2",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table2@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425520",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482917693",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "qualifiedName":"default.table3@cl1:1482918390000",
    "queryText":"create table table3 as select * from table2",
    "clusterName":"cl1",
    "userName":"hive"
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

You have to change multiple properties, basically there is a input JSON block that talks about the entity(hive table, say table2) and output JSON block that talks about the entity(hive table say table3) which acts as input and output to the process respectively. Hope this helps.

View solution in original post

9 REPLIES 9

avatar

@Manoj Dhake

As an extension to what was answered here, just create another table named table3 and submit the below json using /api/atlas/entities REST API.

[{
  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
  "id":{
    "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
    "id":"-11893021824425513",
    "version":0,
    "typeName":"hive_process",
    "state":"ACTIVE"
  },
  "typeName":"hive_process",
  "values":{
    "queryId":"hive_20161228094619_81b13647-4f7f-4f1b-9c08-0f64eb8dbb34",
    "name":"create table table3 as select * from table2",
    "startTime":"2016-12-28T09:46:19.003Z",
    "queryPlan":{
 
 
    },
    "operationType":"CREATETABLE_AS_SELECT",
    "outputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425516",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table3",
          "createTime":"2016-12-28T09:46:30.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425517",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table3@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425514",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table3.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425516",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:46:30.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425515",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table3",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table3@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425516",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482918390",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "endTime":"2016-12-28T09:46:31.211Z",
    "recentQueries":[
      "create table table3 as select * from table2"
    ],
    "inputs":[
      {
        "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
        "id":{
          "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id":"-11893021824425520",
          "version":0,
          "typeName":"hive_table",
          "state":"ACTIVE"
        },
        "typeName":"hive_table",
        "values":{
          "tableType":"MANAGED_TABLE",
          "name":"table2",
          "createTime":"2016-12-28T09:34:53.000Z",
          "temporary":false,
          "db":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425521",
              "version":0,
              "typeName":"hive_db",
              "state":"ACTIVE"
            },
            "typeName":"hive_db",
            "values":{
              "name":"default",
              "location":"hdfs://mycluster/apps/hive/warehouse",
              "description":"Default Hive database",
              "ownerType":2,
              "qualifiedName":"default@cl1",
              "owner":"public",
              "clusterName":"cl1",
              "parameters":{
 
 
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "retention":0,
          "qualifiedName":"default.table2@cl1",
          "columns":[
            {
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
              "id":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425518",
                "version":0,
                "typeName":"hive_column",
                "state":"ACTIVE"
              },
              "typeName":"hive_column",
              "values":{
                "name":"abc",
                "qualifiedName":"default.table2.abc@cl1",
                "owner":"hive",
                "type":"string",
                "table":{
                  "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                  "id":"-11893021824425520",
                  "version":0,
                  "typeName":"hive_table",
                  "state":"ACTIVE"
                }
              },
              "traitNames":[
 
 
              ],
              "traits":{
 
 
              }
            }
          ],
          "lastAccessTime":"2016-12-28T09:34:53.000Z",
          "owner":"hive",
          "sd":{
            "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
            "id":{
              "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
              "id":"-11893021824425519",
              "version":0,
              "typeName":"hive_storagedesc",
              "state":"ACTIVE"
            },
            "typeName":"hive_storagedesc",
            "values":{
              "location":"hdfs://mycluster/apps/hive/warehouse/table2",
              "serdeInfo":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
                "typeName":"hive_serde",
                "values":{
                  "serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "parameters":{
                    "serialization.format":"1"
                  }
                }
              },
              "qualifiedName":"default.table2@cl1_storage",
              "outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "compressed":false,
              "numBuckets":-1,
              "inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
              "parameters":{
 
 
              },
              "storedAsSubDirectories":false,
              "table":{
                "jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
                "id":"-11893021824425520",
                "version":0,
                "typeName":"hive_table",
                "state":"ACTIVE"
              }
            },
            "traitNames":[
 
 
            ],
            "traits":{
 
 
            }
          },
          "parameters":{
            "rawDataSize":"0",
            "numFiles":"0",
            "transient_lastDdlTime":"1482917693",
            "totalSize":"0",
            "COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
            "numRows":"0"
          },
          "partitionKeys":[
 
 
          ]
        },
        "traitNames":[
 
 
        ],
        "traits":{
 
 
        }
      }
    ],
    "qualifiedName":"default.table3@cl1:1482918390000",
    "queryText":"create table table3 as select * from table2",
    "clusterName":"cl1",
    "userName":"hive"
  },
  "traitNames":[
 
 
  ],
  "traits":{
 
 
  }
}]

You have to change multiple properties, basically there is a input JSON block that talks about the entity(hive table, say table2) and output JSON block that talks about the entity(hive table say table3) which acts as input and output to the process respectively. Hope this helps.

avatar
Super Collaborator

Thanks to see you gain Ayub,

Could you please post what changes you have made in above json?

did you change guid somewhere to link to dataset or something else?

avatar

Nope, GUIDs here are just negative large numbers. Entities(hive tables, process) are identified by their qualified name and when the JSON is saved to the backend datastore, it will be stored with the actual GUIDs of entities(hive tables and hive process). Attaching diff.txt of two processes JSON, this should give you the list of changes. Let me know if you have any queries

avatar
Super Collaborator

Thank you Ayub,

it's working fine for me,

I have one query here,

I have one table table5 has columns id, name and age, After inserting the table5 metadata into Atlas I am getting the repeated column I.e. name only. however not getting metadata for Id and age column.

Please find here the below table5 JSON & let me know if any mistake is there in JSON.

Please find attached Atlas image I am getting output like shown in image.

atlas-snapshot.png

[
{
"traits":{

},
"traitNames":[

],
"values":{
"ownerType":2,
"owner":"root",
"qualifiedName":"default@Sandbox",
"clusterName":"Sandbox",
"name":"default",
"description":"emr hive database",
"location":"hdfs:\/\/sandbox.hortonworks.com:8020\/apps\/hive\/\/warehouse",
"parameters":{

}
},
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"typeName":"hive_db",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_db",
"id":"-11893021824425525",
"state":"ACTIVE",
"version":0
}
},
{
"traits":{

},
"traitNames":[

],
"values":{
"owner":"root",
"temporary":false,
"lastAccessTime":"2017-01-03T11:02:53.000Z",
"qualifiedName":"default.table5@Sandbox",
"columns":[
{
"traits":{

},
"traitNames":[

],
"values":{
"owner":"root",
"qualifiedName":"default.table5.name@Sandbox",
"name":"name",
"type":"string",
"table":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_table",
"id":"-11893021824425524",
"state":"ACTIVE",
"version":0
}
},
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"typeName":"hive_column",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_column",
"id":"-11893021824425522",
"state":"ACTIVE",
"version":0
}
},
{
"traits":{

},
"traitNames":[

],
"values":{
"owner":"root",
"qualifiedName":"default.table5.id@Sandbox",
"name":"id",
"type":"int",
"table":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_table",
"id":"-11893021824425524",
"state":"ACTIVE",
"version":0
}
},
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"typeName":"hive_column",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_column",
"id":"-11893021824425522",
"state":"ACTIVE",
"version":0
}
},
{
"traits":{

},
"traitNames":[

],
"values":{
"owner":"root",
"qualifiedName":"default.table5.age@Sandbox",
"name":"age",
"type":"int",
"table":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_table",
"id":"-11893021824425524",
"state":"ACTIVE",
"version":0
}
},
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"typeName":"hive_column",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_column",
"id":"-11893021824425522",
"state":"ACTIVE",
"version":0
}
}
],
"tableType":"MANAGED_TABLE",
"sd":{
"traits":{

},
"traitNames":[

],
"values":{
"qualifiedName":"default.table5@Sandbox_storage",
"storedAsSubDirectories":false,
"location":"hdfs:\/\/sandbox.hortonworks.com:8020\/apps\/hive\/warehouse\/table5",
"compressed":false,
"inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
"outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"parameters":{

},
"serdeInfo":{
"values":{
"serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"parameters":{
"serialization.format":"1"
}
},
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
"typeName":"hive_serde"
},
"table":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_table",
"id":"-11893021824425524",
"state":"ACTIVE",
"version":0
},
"numBuckets":-1
},
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"typeName":"hive_storagedesc",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_storagedesc",
"id":"-11893021824425523",
"state":"ACTIVE",
"version":0
}
},
"createTime":"2017-01-03T11:02:53.000Z",
"name":"table5",
"partitionKeys":[

],
"parameters":{
"totalSize":"0",
"rawDataSize":"0",
"numRows":"0",
"COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
"numFiles":"0",
"transient_lastDdlTime":"1482917693"
},
"db":{
"traits":{

},
"traitNames":[

],
"values":{
"ownerType":2,
"owner":"root",
"qualifiedName":"default@Sandbox",
"clusterName":"Sandbox",
"name":"default",
"description":"emr hive database",
"location":"hdfs:\/\/sandbox.hortonworks.com:8020\/apps\/hive\/\/warehouse",
"parameters":{

}
},
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"typeName":"hive_db",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_db",
"id":"-11893021824425525",
"state":"ACTIVE",
"version":0
}
},
"retention":0
},
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"typeName":"hive_table",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"typeName":"hive_table",
"id":"-11893021824425524",
"state":"ACTIVE",
"version":0
}
}
]

avatar
Super Collaborator

Hi Ayub,

The above issue is solved. Actually there was mistake in JSON.

Whenever we have multiple columns in a table we must have to provide different random GUID long number (not same for all the column) even though it's negative number only in such a case Apache atlas will able to different column names otherwise,will get same name for all columns in Apache Atlas UI.

To make this work I have just provided different random ID for each columns as follows:

Please find attached the correct JSON file :

avatar

@Manoj Dhake

Great! I was about to share the same info.. Thanks for sharing the details.

avatar
Super Collaborator

Hi Ayub,

We have two table i.e. table4 and table5 having columns Id, Name, Age.

we have inserted entity metadata, lineage metadata of both table in Atlas and able to see the schema and lineage graph in Atlas.

After that I have deleted the entity metadata of table5 and reinserted entity metadata of table5.

Next I have inserted the same lineage metadata (earlier lineage JSON metadata) of both table in atlas; however not able to see the lineage graph of two table. getting below response message from atlas server.

{"requestId":"qtp662559856-30620 - 26dbb640-9629-4c29-b209-32331e52962e","entities":{}}

Please find here the below lineage JSON metadata and let me know the mistake I have done.

lineagejson.txt

so after deleting table hive table entity and reinserting same metadata(i.e. hive table entity JSON data) we are unable to see the lineage in atlas ,so what could be the reason behind this?

what actually mistake I am making in lineage json second time because of which I am not getting lineage?

Do I need to change the value of process id or process name in JSON ?

avatar

@Manoj Dhake

Could you please post this as a different question? As this might help other community members as well?

avatar
Super Collaborator