Support Questions
Find answers, ask questions, and share your expertise

Can i able to add new column in hive_table in Apache Atlas ?

Explorer

In atlas, I want to add a new column (Ex : Data Custodian) in hive_table ?

Is it possible ?

If its possible , Please explain ?

Thanks in advance...!!!!

1 ACCEPTED SOLUTION

Accepted Solutions

Expert Contributor

@Satya Nittala

For this requirement , please look at Classification. You can create a classification/tag with attributes. For example , create a tag named PI with required attributes like expiry date etc., and associate it to the hive_table entity.

The attributes like columns , comments , aliases ,comment,createTime,db etc., are specific to Hive model. Information like Data custodian, Data owner & PI Information are not available in Hive. So it is not advisable to add such attributes to Hive model in Atlas. But,you may very well classify data based on tags - which is the recommended way.

Once the table is associated to tag , you can query for the tag using search APIs , and it would list all the entities associated to the tag.

For example ,

1. Create a tag named PI with attribute expiry date of type date.

2. Associate the tag PI to the hive_table entity with date value for expiry date.

3.Now you can query for the tag PI with the particular expiry date.

Please let me know if you need some more information on this.

View solution in original post

9 REPLIES 9

Expert Contributor
@Satya Nittala

I hope you have enabled Atlas Hive hook settings. If yes , all updates to the hive table are captured by Atlas. When column is added in Hive, you can find the newly created hive_column entity in Atlas.

Explorer

Thanks for u r quick response @Sharmadha Sainath , I want new KEY in Atlas UI for type=hive_table , Not a user created table column ,

Examples how aliases ,columns,comment,createTime,db, description,lastAccessTime etc fields ...like that one new column

Expert Contributor

@Satya Nittala

hive_table is a type and fields you mentioned like aliases ,columns,comment,createTime,db etc., are attributes of hive_table.

Type can be updated using PUT (http://atlas.apache.org/api/v2/resource_TypesREST.html#resource_TypesREST_updateAtlasTypeDefs_PUT).

This requires fetching the type definition and updating with new attribute.

For example ,

Following GET REST call is used to fetch the hive_table type definition :

http://atlashost:21000/api/atlas/v2/types/entitydef/name/hive_table

After fetching the type definition , new attribute definition can be added in the attributeDefs array as

{

name: "new_attribute",
typeName: "string",
isOptional: true,
cardinality: "SINGLE",
valuesMinCount: 0,
valuesMaxCount: 1,
isUnique: false,
isIndexable: false
}

name : name of the new attribute

typename : data type of the attribute

isOptional : if the entity can be created without providing values for the attribute. (Note : updating a type with new mandatory attribute is not allowed. While updating , provide isOptional as True).

and the updated JSON can be PUT to

http://atlashost:21000/api/atlas/v2/types/typedefs

For example , in the text file attached , I have added new attribute definition . GUID of the hive_table has to be modified based on your Atlas instance.

Please let me know if you are stuck somewhere in this procedure.

One question :

hive_table is a defined type in Atlas.It has all attributes which will be required for maintaining hive meta data. May I know why you want to update it ? What is the new attribute you want to add ? could you please explain the use case behind it ?

Explorer

@Sharmadha Sainath , Thanks for your quick response,
I need to maintain additional metadata for those hive_table, This information is given by Business, This is Business metadata information is like Data custodian, Data owner & PI Information etc ...These attributes are not available in TYPE :hive_table in Atlas, So i want to create a new attributes TYPE :hive_table and Need to move these information in new attributes

Expert Contributor

@Satya Nittala

For this requirement , please look at Classification. You can create a classification/tag with attributes. For example , create a tag named PI with required attributes like expiry date etc., and associate it to the hive_table entity.

The attributes like columns , comments , aliases ,comment,createTime,db etc., are specific to Hive model. Information like Data custodian, Data owner & PI Information are not available in Hive. So it is not advisable to add such attributes to Hive model in Atlas. But,you may very well classify data based on tags - which is the recommended way.

Once the table is associated to tag , you can query for the tag using search APIs , and it would list all the entities associated to the tag.

For example ,

1. Create a tag named PI with attribute expiry date of type date.

2. Associate the tag PI to the hive_table entity with date value for expiry date.

3.Now you can query for the tag PI with the particular expiry date.

Please let me know if you need some more information on this.

View solution in original post

Explorer

@Sharmadha Sainath
Tags creation we have to do manually in UI , Every day i will get the updated metadata, So cannot modify TAGS on everyday , If some attributes are exists , Then i will make job with CURL Command to override on daily basis with updated data ,

Can we create a TAG's in back end ?

Expert Contributor

@Satya Nittala

yes , POST the JSON body attached in the file to

http://localhost:21000/api/atlas/v2/types/typedefs?type=classification 

In the tag definition , name is the name of the tag , and attributeDefs is an JSON array of attribute definitions. I have added expiry_date attribute of type date in the example.

Once the tag is created , the tag can be associated to the hive_table entity by POSTing the attached tag-association.txt to

http://localhost:21000/api/atlas/v2/entity/bulk/classification

in tag-association.txt , "name" is the name of tag . attribute values can be provided in "attributes". entityGuids is the list of all the GUIDs of entity the tag should be associated to. In this array , you can provide the hive_table GUID.

Explorer
@Sharmadha Sainath

Can u please tell me the easy way to execute JSON script from edge node ?

I tried with below CURL Command ,

curl -vX POST -u admin:admin http://Localhost:2100/api/atlas/v2/types/entitydef/name/hive_table.json -d user/testplace.json --header "Content-Type: application/json"

showing below error :

< HTTP/1.1 100 Continue < HTTP/1.1 500 Internal Server Error < Set-Cookie: ATLASSESSIONID=1m6u83lz9dwylwfy2b702hfj9;Path=/;HttpOnly < Expires: Thu, 01 Jan 1970 00:00:00 GMT < X-Frame-Options: DENY < Content-Type: text/plain < Transfer-Encoding: chunked < Server: Jetty(8.1.19.v20160209) * HTTP error before end of send, stop sending

Please help me to execute json script from edge node ?

Thanks in advance...!!!

Expert Contributor

@Satya Nittala

1. Please use correct port . By default , Atlas in non-SSL environment is configured to use 21000.

2. curl requires "@" for providing files. Example : -d @user/testplace.json

3. To update type , "PUT" (not POST. To create types use POST, to update types use "PUT") the JSON to

http://atlashost:21000/api/atlas/v2/types/typedefs

4. As already mentioned , classification/tag best suits your requirement. Its highly recommended to use tags instead of updating types.