Member since
09-11-2015
269
Posts
281
Kudos Received
55
Solutions
03-28-2018
03:57 AM
1 Kudo
Currently there is no Atlas hook for HBase, HDFS, or Kafka. For these components, you must manually create entities in Atlas. You can then associate tags with these entities and control access using Ranger tag-based policies.
On the Atlas web UI Search page, click the create new entity link at the top of the page. On the Create Entity pop-up, select an entity type. Enter the required information for the new entity. Click All to display both required and non-required information. Click Create to create the new entity. The entity is created and returned in search results for the applicable entity type. You can now associate tags with the new entity and control access to the entity with Ranger tag-based policies. The above example illustrates manually creating an entity for HDFS_PATH only. This can be extended to KAFKA and HBASE entities.
... View more
Labels:
05-23-2017
10:29 AM
You can delete solr core if no longer want to store indexes.
... View more
02-14-2017
06:38 PM
12 Kudos
Recently, I have observed there are lot of questions in HCC on - how to reset Atlas and Is there a way to delete the registered types in Atlas. So, I thought of sharing this article with the community to clarify these queries. To give some context, Atlas uses HBase as its default datastore when its managed by Ambari. So, basically it uses two Hbase tables to store all its metadata. 'atlas_titan' : stores all the metadata from various sources. 'ATLAS_ENTITY_AUDIT_EVENTS': stores the audit information of the entities in Atlas The above two table names can be changed using two properties "atlas.graph.storage.hbase.table" and "atlas.audit.hbase.tablename" respectively in the atlas-application.properties. Now, coming back to the actual question - how to wipe-out the metadata from the Atlas? Follow below steps to achieve the same. Stop Atlas via Ambari. In hbase terminal, to disable hbase table, run this command. disable 'atlas_titan' In hbase terminal, to drop hbase table, run this command. drop 'atlas_titan' Start Atlas via Ambari. The above steps can be repeated for 'ATLAS_ENTITY_AUDIT_EVENTS' table if there is requirement to wipe-out audit data as well. This above steps should reset atlas and start it as if it is a fresh installation. Let me know if there any queries. Thanks.
... View more
Labels:
12-28-2016
11:38 AM
20 Kudos
Problem: Of late, there are many HCC questions on how to create hive table and lineage using REST APIs in Atlas. This article will be act a step by step guide to create hive tables and lineage using REST API. Solution: As part of the solution to this FAQ, I will create two hive tables and lineage(CTAS) between them. I have tested these changes on HDP-2.5 release, so make sure you have HDP version >= 2.5. Step1: JSON for creating table1: [{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425525",
"version":0,
"typeName":"hive_db",
"state":"ACTIVE"
},
"typeName":"hive_db",
"values":{
"name":"default",
"location":"hdfs://mycluster/apps/hive/warehouse",
"description":"Default Hive database",
"ownerType":2,
"qualifiedName":"default@cl1",
"owner":"public",
"clusterName":"cl1",
"parameters":{
}
},
"traitNames":[
],
"traits":{
}
},{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425524",
"version":0,
"typeName":"hive_table",
"state":"ACTIVE"
},
"typeName":"hive_table",
"values":{
"tableType":"MANAGED_TABLE",
"name":"table1",
"createTime":"2016-12-28T09:34:53.000Z",
"temporary":false,
"db":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425525",
"version":0,
"typeName":"hive_db",
"state":"ACTIVE"
},
"typeName":"hive_db",
"values":{
"name":"default",
"location":"hdfs://mycluster/apps/hive/warehouse",
"description":"Default Hive database",
"ownerType":2,
"qualifiedName":"default@cl1",
"owner":"public",
"clusterName":"cl1",
"parameters":{
}
},
"traitNames":[
],
"traits":{
}
},
"retention":0,
"qualifiedName":"default.table1@cl1",
"columns":[
{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425522",
"version":0,
"typeName":"hive_column",
"state":"ACTIVE"
},
"typeName":"hive_column",
"values":{
"name":"abc",
"qualifiedName":"default.table1.abc@cl1",
"owner":"hive",
"type":"string",
"table":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425524",
"version":0,
"typeName":"hive_table",
"state":"ACTIVE"
}
},
"traitNames":[
],
"traits":{
}
}
],
"lastAccessTime":"2016-12-28T09:34:53.000Z",
"owner":"hive",
"sd":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425523",
"version":0,
"typeName":"hive_storagedesc",
"state":"ACTIVE"
},
"typeName":"hive_storagedesc",
"values":{
"location":"hdfs://mycluster/apps/hive/warehouse/table1",
"serdeInfo":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
"typeName":"hive_serde",
"values":{
"serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"parameters":{
"serialization.format":"1"
}
}
},
"qualifiedName":"default.table1@cl1_storage",
"outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"compressed":false,
"numBuckets":-1,
"inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
"parameters":{
},
"storedAsSubDirectories":false,
"table":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425524",
"version":0,
"typeName":"hive_table",
"state":"ACTIVE"
}
},
"traitNames":[
],
"traits":{
}
},
"parameters":{
"rawDataSize":"0",
"numFiles":"0",
"transient_lastDdlTime":"1482917693",
"totalSize":"0",
"COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
"numRows":"0"
},
"partitionKeys":[
]
},
"traitNames":[
],
"traits":{
}
}]
Save the above json to a file. Step2: REST API call to create the hive table entity. curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json; charset=UTF-8' -u admin:admin -d @sample.json http://<IP_ADDRESS>:21000/api/atlas/entities The above will help in creating a hive table entity. Step3: JSON for creating table2: [{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425525",
"version":0,
"typeName":"hive_db",
"state":"ACTIVE"
},
"typeName":"hive_db",
"values":{
"name":"default",
"location":"hdfs://mycluster/apps/hive/warehouse",
"description":"Default Hive database",
"ownerType":2,
"qualifiedName":"default@cl1",
"owner":"public",
"clusterName":"cl1",
"parameters":{
}
},
"traitNames":[
],
"traits":{
}
},{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425524",
"version":0,
"typeName":"hive_table",
"state":"ACTIVE"
},
"typeName":"hive_table",
"values":{
"tableType":"MANAGED_TABLE",
"name":"table2",
"createTime":"2016-12-28T09:34:53.000Z",
"temporary":false,
"db":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425525",
"version":0,
"typeName":"hive_db",
"state":"ACTIVE"
},
"typeName":"hive_db",
"values":{
"name":"default",
"location":"hdfs://mycluster/apps/hive/warehouse",
"description":"Default Hive database",
"ownerType":2,
"qualifiedName":"default@cl1",
"owner":"public",
"clusterName":"cl1",
"parameters":{
}
},
"traitNames":[
],
"traits":{
}
},
"retention":0,
"qualifiedName":"default.table2@cl1",
"columns":[
{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425522",
"version":0,
"typeName":"hive_column",
"state":"ACTIVE"
},
"typeName":"hive_column",
"values":{
"name":"abc",
"qualifiedName":"default.table2.abc@cl1",
"owner":"hive",
"type":"string",
"table":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425524",
"version":0,
"typeName":"hive_table",
"state":"ACTIVE"
}
},
"traitNames":[
],
"traits":{
}
}
],
"lastAccessTime":"2016-12-28T09:34:53.000Z",
"owner":"hive",
"sd":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425523",
"version":0,
"typeName":"hive_storagedesc",
"state":"ACTIVE"
},
"typeName":"hive_storagedesc",
"values":{
"location":"hdfs://mycluster/apps/hive/warehouse/table2",
"serdeInfo":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
"typeName":"hive_serde",
"values":{
"serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"parameters":{
"serialization.format":"1"
}
}
},
"qualifiedName":"default.table2@cl1_storage",
"outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"compressed":false,
"numBuckets":-1,
"inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
"parameters":{
},
"storedAsSubDirectories":false,
"table":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425524",
"version":0,
"typeName":"hive_table",
"state":"ACTIVE"
}
},
"traitNames":[
],
"traits":{
}
},
"parameters":{
"rawDataSize":"0",
"numFiles":"0",
"transient_lastDdlTime":"1482917693",
"totalSize":"0",
"COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
"numRows":"0"
},
"partitionKeys":[
]
},
"traitNames":[
],
"traits":{
}
}]
Save the above json to a file. Step4: Repeat step2 with step3 json
Step5: JSON to create lineage between above two hive tables: [{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425513",
"version":0,
"typeName":"hive_process",
"state":"ACTIVE"
},
"typeName":"hive_process",
"values":{
"queryId":"hive_20161228094619_81b13647-4f7f-4f1b-9c08-0f64eb8dbb34",
"name":"create table table2 as select * from table1",
"startTime":"2016-12-28T09:46:19.003Z",
"queryPlan":{
},
"operationType":"CREATETABLE_AS_SELECT",
"outputs":[
{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425516",
"version":0,
"typeName":"hive_table",
"state":"ACTIVE"
},
"typeName":"hive_table",
"values":{
"tableType":"MANAGED_TABLE",
"name":"table2",
"createTime":"2016-12-28T09:46:30.000Z",
"temporary":false,
"db":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425517",
"version":0,
"typeName":"hive_db",
"state":"ACTIVE"
},
"typeName":"hive_db",
"values":{
"name":"default",
"location":"hdfs://mycluster/apps/hive/warehouse",
"description":"Default Hive database",
"ownerType":2,
"qualifiedName":"default@cl1",
"owner":"public",
"clusterName":"cl1",
"parameters":{
}
},
"traitNames":[
],
"traits":{
}
},
"retention":0,
"qualifiedName":"default.table2@cl1",
"columns":[
{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425514",
"version":0,
"typeName":"hive_column",
"state":"ACTIVE"
},
"typeName":"hive_column",
"values":{
"name":"abc",
"qualifiedName":"default.table2.abc@cl1",
"owner":"hive",
"type":"string",
"table":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425516",
"version":0,
"typeName":"hive_table",
"state":"ACTIVE"
}
},
"traitNames":[
],
"traits":{
}
}
],
"lastAccessTime":"2016-12-28T09:46:30.000Z",
"owner":"hive",
"sd":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425515",
"version":0,
"typeName":"hive_storagedesc",
"state":"ACTIVE"
},
"typeName":"hive_storagedesc",
"values":{
"location":"hdfs://mycluster/apps/hive/warehouse/table2",
"serdeInfo":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
"typeName":"hive_serde",
"values":{
"serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"parameters":{
"serialization.format":"1"
}
}
},
"qualifiedName":"default.table2@cl1_storage",
"outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"compressed":false,
"numBuckets":-1,
"inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
"parameters":{
},
"storedAsSubDirectories":false,
"table":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425516",
"version":0,
"typeName":"hive_table",
"state":"ACTIVE"
}
},
"traitNames":[
],
"traits":{
}
},
"parameters":{
"rawDataSize":"0",
"numFiles":"0",
"transient_lastDdlTime":"1482918390",
"totalSize":"0",
"COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
"numRows":"0"
},
"partitionKeys":[
]
},
"traitNames":[
],
"traits":{
}
}
],
"endTime":"2016-12-28T09:46:31.211Z",
"recentQueries":[
"create table table2 as select * from table1"
],
"inputs":[
{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425520",
"version":0,
"typeName":"hive_table",
"state":"ACTIVE"
},
"typeName":"hive_table",
"values":{
"tableType":"MANAGED_TABLE",
"name":"table1",
"createTime":"2016-12-28T09:34:53.000Z",
"temporary":false,
"db":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425521",
"version":0,
"typeName":"hive_db",
"state":"ACTIVE"
},
"typeName":"hive_db",
"values":{
"name":"default",
"location":"hdfs://mycluster/apps/hive/warehouse",
"description":"Default Hive database",
"ownerType":2,
"qualifiedName":"default@cl1",
"owner":"public",
"clusterName":"cl1",
"parameters":{
}
},
"traitNames":[
],
"traits":{
}
},
"retention":0,
"qualifiedName":"default.table1@cl1",
"columns":[
{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425518",
"version":0,
"typeName":"hive_column",
"state":"ACTIVE"
},
"typeName":"hive_column",
"values":{
"name":"abc",
"qualifiedName":"default.table1.abc@cl1",
"owner":"hive",
"type":"string",
"table":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425520",
"version":0,
"typeName":"hive_table",
"state":"ACTIVE"
}
},
"traitNames":[
],
"traits":{
}
}
],
"lastAccessTime":"2016-12-28T09:34:53.000Z",
"owner":"hive",
"sd":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425519",
"version":0,
"typeName":"hive_storagedesc",
"state":"ACTIVE"
},
"typeName":"hive_storagedesc",
"values":{
"location":"hdfs://mycluster/apps/hive/warehouse/table1",
"serdeInfo":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
"typeName":"hive_serde",
"values":{
"serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"parameters":{
"serialization.format":"1"
}
}
},
"qualifiedName":"default.table1@cl1_storage",
"outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"compressed":false,
"numBuckets":-1,
"inputFormat":"org.apache.hadoop.mapred.TextInputFormat",
"parameters":{
},
"storedAsSubDirectories":false,
"table":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-11893021824425520",
"version":0,
"typeName":"hive_table",
"state":"ACTIVE"
}
},
"traitNames":[
],
"traits":{
}
},
"parameters":{
"rawDataSize":"0",
"numFiles":"0",
"transient_lastDdlTime":"1482917693",
"totalSize":"0",
"COLUMN_STATS_ACCURATE":"{\"BASIC_STATS\":\"true\"}",
"numRows":"0"
},
"partitionKeys":[
]
},
"traitNames":[
],
"traits":{
}
}
],
"qualifiedName":"default.table2@cl1:1482918390000",
"queryText":"create table table2 as select * from table1",
"clusterName":"cl1",
"userName":"hive"
},
"traitNames":[
],
"traits":{
}
}]
Save the above json to a file. Step6: Repeat step2 with step5 json Step7: Over Atlas UI, lineage between two entities can be seen as below. Hope this clarifies the queries on creating hive tables using rest API. Please let me know if there are any queries in the comments, I will be more than happy to help. References: REST API help: atlas.incubator.apache.org/api/rest.html Usage guide: http://atlas.incubator.apache.org/AtlasTechnicalUserGuide.pdf Atlas project page: http://atlas.incubator.apache.org/
... View more
Labels:
10-13-2016
02:35 AM
9 Kudos
This article assumes that, you have a HDP-2.5 cluster with Atlas & Hive enabled. Also make sure that atlas is up and running on that cluster. Please refer to this documentation link for deploying cluster with Atlas enabled. Atlas provides a script/tool to import metadata from hive for all hive entities like tables, database, views, columns etc... This tool/script requires hadoop and hive classpath jars, to make them available to the script Make sure that the environment variable HADOOP_CLASSPATH is set OR HADOOP_HOME to point to root directory of your Hadoop installation. set HIVE_HOME env variable to the root of Hive installation. Set env variable HIVE_CONF_DIR to Hive configuration directory. Copy <atlas-conf>/atlas-application.properties to the hive conf directory
Once the above steps are successfully completed, now we are ready to run the script. Usage: <atlas package>/hook-bin/import-hive.sh When you run the above command, you should see the below messages over console. [root@atlas-blueprint-test-1 ~]# /usr/hdp/current/atlas-server/hook-bin/import-hive.sh
Using Hive configuration directory [/etc/hive/conf]
Log file for import is /usr/hdp/current/atlas-server/logs/import-hive.log
2016-10-13 01:57:18,676 INFO - [main:] ~ Looking for atlas-application.properties in classpath (ApplicationProperties:73)
2016-10-13 01:57:18,701 INFO - [main:] ~ Loading atlas-application.properties from file:/etc/hive/2.5.0.0-1245/0/atlas-application.properties (ApplicationProperties:86)
2016-10-13 01:57:18,922 DEBUG - [main:] ~ Configuration loaded: (ApplicationProperties:99)
2016-10-13 01:57:18,923 DEBUG - [main:] ~ atlas.authentication.method.kerberos = False (ApplicationProperties:102)
2016-10-13 01:57:18,972 DEBUG - [main:] ~ atlas.cluster.name = atlasBP (ApplicationProperties:102)
2016-10-13 01:57:18,972 DEBUG - [main:] ~ atlas.hook.hive.keepAliveTime = 10 (ApplicationProperties:102)
2016-10-13 01:57:18,972 DEBUG - [main:] ~ atlas.hook.hive.maxThreads = 5 (ApplicationProperties:102)
2016-10-13 01:57:18,973 DEBUG - [main:] ~ atlas.hook.hive.minThreads = 5 (ApplicationProperties:102)
2016-10-13 01:57:18,973 DEBUG - [main:] ~ atlas.hook.hive.numRetries = 3 (ApplicationProperties:102)
2016-10-13 01:57:18,973 DEBUG - [main:] ~ atlas.hook.hive.queueSize = 1000 (ApplicationProperties:102)
2016-10-13 01:57:18,973 DEBUG - [main:] ~ atlas.hook.hive.synchronous = false (ApplicationProperties:102)
2016-10-13 01:57:18,974 DEBUG - [main:] ~ atlas.kafka.bootstrap.servers = atlas-blueprint-test-1.openstacklocal:6667 (ApplicationProperties:102)
2016-10-13 01:57:18,974 DEBUG - [main:] ~ atlas.kafka.hook.group.id = atlas (ApplicationProperties:102)
2016-10-13 01:57:18,974 DEBUG - [main:] ~ atlas.kafka.zookeeper.connect = [atlas-blueprint-test-1.openstacklocal:2181, atlas-blueprint-test-2.openstacklocal:2181] (ApplicationProperties:102)
2016-10-13 01:57:18,974 DEBUG - [main:] ~ atlas.kafka.zookeeper.connection.timeout.ms = 200 (ApplicationProperties:102)
2016-10-13 01:57:18,974 DEBUG - [main:] ~ atlas.kafka.zookeeper.session.timeout.ms = 400 (ApplicationProperties:102)
2016-10-13 01:57:18,981 DEBUG - [main:] ~ atlas.kafka.zookeeper.sync.time.ms = 20 (ApplicationProperties:102)
2016-10-13 01:57:18,981 DEBUG - [main:] ~ atlas.notification.create.topics = True (ApplicationProperties:102)
2016-10-13 01:57:18,982 DEBUG - [main:] ~ atlas.notification.replicas = 1 (ApplicationProperties:102)
2016-10-13 01:57:18,982 DEBUG - [main:] ~ atlas.notification.topics = [ATLAS_HOOK, ATLAS_ENTITIES] (ApplicationProperties:102)
2016-10-13 01:57:18,982 DEBUG - [main:] ~ atlas.rest.address = http://atlas-blueprint-test-1.openstacklocal:21000 (ApplicationProperties:102)
2016-10-13 01:57:18,993 DEBUG - [main:] ~ ==> InMemoryJAASConfiguration.init() (InMemoryJAASConfiguration:168)
2016-10-13 01:57:18,998 DEBUG - [main:] ~ ==> InMemoryJAASConfiguration.init() (InMemoryJAASConfiguration:181)
2016-10-13 01:57:19,043 DEBUG - [main:] ~ ==> InMemoryJAASConfiguration.initialize() (InMemoryJAASConfiguration:220)
2016-10-13 01:57:19,045 DEBUG - [main:] ~ <== InMemoryJAASConfiguration.initialize() (InMemoryJAASConfiguration:347)
2016-10-13 01:57:19,045 DEBUG - [main:] ~ <== InMemoryJAASConfiguration.init() (InMemoryJAASConfiguration:190)
2016-10-13 01:57:19,046 DEBUG - [main:] ~ <== InMemoryJAASConfiguration.init() (InMemoryJAASConfiguration:177)
.
.
.
.
.
2016-10-13 01:58:09,251 DEBUG - [main:] ~ Using resource http://atlas-blueprint-test-1.openstacklocal:21000/api/atlas/entities/1e78f7ed-c8d4-4c11-9bfa-da08be7c6b60 for 0 times (AtlasClient:784)
2016-10-13 01:58:10,700 DEBUG - [main:] ~ API http://atlas-blueprint-test-1.openstacklocal:21000/api/atlas/entities/1e78f7ed-c8d4-4c11-9bfa-da08be7c6b60 returned status 200 (AtlasClient:1191)
2016-10-13 01:58:10,703 DEBUG - [main:] ~ Getting reference for process default.timesheets_test@atlasBP:1474621469000 (HiveMetaStoreBridge:346)
2016-10-13 01:58:10,703 DEBUG - [main:] ~ Using resource http://atlas-blueprint-test-1.openstacklocal:21000/api/atlas/entities?type=hive_process&property=qualifiedName&value=default.timesheets_test@atlasBP:1474621469000 for 0 times (AtlasClient:784)
2016-10-13 01:58:10,893 DEBUG - [main:] ~ API http://atlas-blueprint-test-1.openstacklocal:21000/api/atlas/entities?type=hive_process&property=qualifiedName&value=default.timesheets_test@atlasBP:1474621469000 returned status 200 (AtlasClient:1191)
2016-10-13 01:58:10,898 INFO - [main:] ~ Process {Id='(type: hive_process, id: 28f5a31a-4812-497e-925b-21bfe59ba68a)', traits=[], values={outputs=[(type: DataSet, id: 1e78f7ed-c8d4-4c11-9bfa-da08be7c6b60)], owner=null, queryGraph=null, recentQueries=[create external table timesheets_test (emp_id int, location string, ts_date string, hours int, revenue double, revenue_per_hr double) row format delimited fields terminated by ',' location 'hdfs://atlas-blueprint-test-1.openstacklocal:8020/user/hive/timesheets'], inputs=[(type: DataSet, id: c259d3a8-5684-4808-9f22-972a2e3e2dd0)], qualifiedName=default.timesheets_test@atlasBP:1474621469000, description=null, userName=hive, queryId=hive_20160923090429_25b5b333-bba5-427f-8ee1-6b743cbcf533, clusterName=atlasBP, name=create external table timesheets_test (emp_id int, location string, ts_date string, hours int, revenue double, revenue_per_hr double) row format delimited fields terminated by ',' location 'hdfs://atlas-blueprint-test-1.openstacklocal:8020/user/hive/timesheets', queryText=create external table timesheets_test (emp_id int, location string, ts_date string, hours int, revenue double, revenue_per_hr double) row format delimited fields terminated by ',' location 'hdfs://atlas-blueprint-test-1.openstacklocal:8020/user/hive/timesheets', startTime=2016-09-23T09:04:29.069Z, queryPlan={}, operationType=CREATETABLE, endTime=2016-09-23T09:04:30.319Z}} is already registered (HiveMetaStoreBridge:305)
2016-10-13 01:58:10,898 INFO - [main:] ~ Successfully imported all 22 tables from default (HiveMetaStoreBridge:261)
Hive Data Model imported successfully!!!
The below message from the console log shows how many tables are imported and is the import successful or not. 2016-10-13 01:58:10,898 INFO - [main:] ~ Successfully imported all 22 tables from default (HiveMetaStoreBridge:261)
Hive Data Model imported successfully!!! Now we can verify for the imported tables over Atlas UI. It should reflect all the 22 tables that are imported as per above. The logs for the import script are in <atlas package>/logs/import-hive.log Running import script on kerberized cluster The above will work perfectly for a simple cluster but for a kerberized cluster, you need to provide additional details to run the command. <atlas package>/hook-bin/import-hive.sh -Dsun.security.jgss.debug=true -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.krb5.conf=[krb5.conf location] -Djava.security.auth.login.config=[jaas.conf location] krb5.conf is typically found at /etc/krb5.conf for details about jaas.conf, see the atlas security documentation
... View more
Labels:
09-26-2016
01:13 PM
7 Kudos
This short post concentrates on solving most common issue found while publishing metadata to kafka topic for Atlas server over a secure(kerberized) cluster. Issue: With AtlasHook configured for Hive/Storm/Falcon, if you are seeing below stack trace in the logs of the corresponding component. This means, AtlasHook is not able to publish metadata to kafka for Atlas consumption. The reason for this failure could be
Kafka topic to which the hook is trying to publish does not exist. OR Kafka topic does not have proper access control lists(ACL) configured for the user. org.apache.kafka.common.KafkaException: Failed to construct kafka producer
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:335)
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:188)
at org.apache.atlas.kafka.KafkaNotification.createProducer(KafkaNotification.java:312)
at org.apache.atlas.kafka.KafkaNotification.sendInternal(KafkaNotification.java:220)
at org.apache.atlas.notification.AbstractNotification.send(AbstractNotification.java:84)
at org.apache.atlas.hook.AtlasHook.notifyEntitiesInternal(AtlasHook.java:126)
at org.apache.atlas.hook.AtlasHook.notifyEntities(AtlasHook.java:111)
at org.apache.atlas.hook.AtlasHook.notifyEntities(AtlasHook.java:157)
at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:274)
at org.apache.atlas.hive.hook.HiveHook.access$200(HiveHook.java:81)
at org.apache.atlas.hive.hook.HiveHook$2.run(HiveHook.java:185)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Could not login: the client is being asked for a password, but the Kafka client code does not currently support obtaining a password from the user. not available to garner authentication information from the user
at org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:86)
at org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:71)
at org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:83)
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:277)
... 15 more
Caused by: javax.security.auth.login.LoginException: Could not login: the client is being asked for a password, but the Kafka client code does not currently support obtaining a password from the user. not available to garner authentication information from the user
at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:940)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:69)
at org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:110)
at org.apache.kafka.common.security.authenticator.LoginManager.<init>(LoginManager.java:46)
at org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:68)
at org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:78)
... 18 more
Resolution: Below are the steps required in secure environments to setup Kafka topics used by Atlas:
Login with Kafka service user identity Create Kafka topics ATLAS_HOOK and ATLAS_ENTITIES with the following commands: $KAFKA_HOME/bin/kafka-topics.sh --zookeeper $ZK_ENDPOINT --topic ATLAS_HOOK --create --partitions 1 --replication-factor $KAFKA_REPL_FACTOR
$KAFKA_HOME/bin/kafka-topics.sh --zookeeper $ZK_ENDPOINT --topic ATLAS_ENTITIES --create --partitions 1 --replication-factor $KAFKA_REPL_FACTOR
Setup ACLs on these topics with following commands: $KAFKA_HOME/bin/kafka-acls.sh --authorizer-properties zookeeper.connect=$ZK_ENDPOINT --add --topic ATLAS_HOOK --allow-principal User:* --producer
$KAFKA_HOME/bin/kafka-acls.sh --authorizer-properties zookeeper.connect=$ZK_ENDPOINT --add --topic ATLAS_HOOK --allow-principal User:$ATLAS_USER --consumer --group atlas
$KAFKA_HOME/bin/kafka-acls.sh --authorizer-properties zookeeper.connect=$ZK_ENDPOINT --add --topic ATLAS_ENTITIES --allow-principal User:$ATLAS_USER --producer
$KAFKA_HOME/bin/kafka-acls.sh --authorizer-properties zookeeper.connect=$ZK_ENDPOINT --add --topic ATLAS_ENTITIES --allow-principal User:$RANGER_USER --consumer --group ranger_entities_consumer
If Ranger authorization is enabled for Kafka, Ranger policies should be setup for the following accesses: topic: ATLAS_HOOK; { group=public; permission=publish }; { user=$ATLAS_USER; permission=consume }
topic: ATLAS_ENTITIES; { user=$ATLAS_USER; permission=publish}; { user=$RANGER_USER; permission=consume } Also check if the atlas-application.properties file under hook(storm/hive/falcon) component configuration directory(typically it is under /etc/storm/conf) have a right keytab and principal information. Below are the two properties you should look for.. atlas.jaas.KafkaClient.option.principal=<component_principal>
atlas.jaas.KafkaClient.option.keyTab=<component_keytab_path>
For example:
atlas.jaas.KafkaClient.option.principal=storm-cl1/_HOST@EXAMPLE.COM
atlas.jaas.KafkaClient.option.keyTab=/etc//keytabs/storm.headless.keytab KAFKA_HOME is typically /usr/hdp/current/kafka-broker ZK_ENDPOINT should be set to Zookeeper URL for Kafka KAFKA_REPL_FACTOR should be set to value of Atlas configuration 'atlas.notification.replicas' ATLAS_USER should the kerberos identity of the Atlas server, typically 'atlas' RANGER_USER should be the kerberos identity of Ranger Tagsync process, typically 'rangertagsync'
... View more
Labels:
09-24-2016
12:20 PM
6 Kudos
This article assumes that, you have a cluster with more than one node(which is a requirement for enabling HA on Atlas). Also make sure that atlas is up and running on that cluster. Please refer to this documentation link for deploying cluster with Atlas enabled. Prerequisites for High Availability feature in Atlas The following prerequisites must be met for setting up the High Availability feature.
Ensure that you install Apache Zookeeper on a cluster of machines (a minimum of 3 servers is recommended for production). Select 2 or more physical machines to run the Atlas Web Service instances on. These machines define what we refer to as a 'server ensemble' for Atlas. Step1: Verify from Ambari UI that Atlas is up and running.. Step2: Stop Atlas using Ambari Step3: Navigate to host page in Ambari where Atlas service is not installed and add one more atlas service. Step4: If Infra solr client is not installed on the host where we are trying to install another instance of Atlas, then Ambari would display this pop up window. Add Infra solr client instance on the same host Step5: After successfully adding Infra solr client, add Atlas server instance by following Step3. Step6: Start the Atlas service now Step7: Verify from ambari UI that both the atlas services should be up and running. Step8: Check which instance of Atlas is Active and which one is passive.. HTTP get request on one of the Atlas instances showing its status as "Active" HTTP get request on one of the Atlas instances showing its status as "Passive" Step9: Now Atlas is running in HA mode(Active-Passive mode). With this, you should be able to access Atlas UI(this can be pulled from Ambari quick links). For more information on Atlas High Availability, Please refer to http://atlas.incubator.apache.org/HighAvailability.html.
... View more
Labels: