Member since
11-30-2016
85
Posts
6
Kudos Received
0
Solutions
08-14-2018
11:57 AM
1 Kudo
Hey @J Koppole, I connected to hive using JayDeBeApi python package. To establish JDBC Connection, Download Hive Uber JDBC Jar created by @Tim Veil. Steps to connect has been mentioned on JayDeBeApi link: I am pasting sample code snippet below import jaydebeapi;
conn = jaydebeapi.connect("org.apache.hive.jdbc.HiveDriver","<hive_jdbc_url>",["<username>", "<password>"],"/path/to/hive-jdbc-uber-<version>.jar")
curs = conn.cursor();
curs.execute("select * from some_table");
curs.close()
... View more
08-09-2018
01:28 PM
logs.zipI am trying to install Apache Atlas on a normal EC2 Instance by following the steps mentioned on Apache Atlas website. I have attached the logs.
... View more
Labels:
08-09-2018
01:23 PM
logs.zipI am trying to install Apache Atlas 1.0.0 on AWS EC2 Instance (Amazon-linux-2) with embedded HBase and Solr. Somehow, I have managed to access REST API's (only on the AWS EC2 instance not on Local) but weburl (http://localhost:21000) is not working at all. I have followed the installation steps written on the apache atlas website. For referenece i have attached the application.log file and atlas-application.properties. Please note that firewall has been disabled and i have allowed all traffic for inbound and outbound rules. Let me know what i am doing wrong. Thank you in Advance, Subash
... View more
Labels:
03-16-2018
11:50 AM
Hi, I tried both bulk and single Rest Endpoints (v1 and v2) to delete entities in Atlas. Both of them works mostly for deleting 3-4 entities and For rest other entities i am getting this weird error. {"error":"Failed to notify for change UPDATE"} Please note that Atlas UI becomes almost unresponsive while running the commands to delete entities. delete_guid = requests.delete(hostname+ '/api/atlas/entities?guid='+str(guid['$id$']['id']), headers=headers,auth =(username,password))
get_json=requests.delete(hostname+'/api/atlas/v2/entities/guids?'+guid_list, headers=headers,auth =(username,password))
get_json_new=requests.delete(hostname+'/api/atlas/v2/entity/bulk?' +guid_list, headers=headers,auth =(username,password))
We are using Atlas 0.8.0 Thank you, Subash
... View more
Labels:
02-19-2018
08:56 AM
looking for something like this : http://localhost:21000/api/atlas/entities?type=type_name This rest endpoint was available in Atlas rest API v1, Not sure how to get the same in Atlas Rest API v2. Thank you, Subash
... View more
02-01-2018
01:33 PM
In Atlas 0.6 and 0.7, This Rest API command used to work perfectly to fetch the list of all tables available in one Hive database. Http://localhost:21000/api/atlas/discovery/search/dsl?query=hive_table+where+db.name%3D%22default%22&limit=1000 Let's say 100 tables are available in "default", 200 tables are in "default_raw", In such scenario, the total count of tables returned from Atlas (0.6 and 0.7) are 100 but in Atlas 0.8 I am getting 300 tables.Ideally, i should get only 100 tables. Let me know if DSL search Endpoints have been changed
... View more
Labels:
01-15-2018
10:35 AM
Hi, As per the docs , I am trying the rest endpoint to fetch all entities in my browser. Rest Endpoint is : http://localhost:21000/api/atlas/v2/entity/bulk I tried multiple combinations of filters, but till now haven't found any luck. Let me know what am I missing here. Please suggest a good doc which can be referred. Ty, Subash
... View more
Labels:
11-15-2017
12:23 PM
Hi, I would like to connect to Hive via Knox using JDBC connection in python. Please note that my cluster is secured and SSL is enabled and we use LDAP to Login to Hive via Knox.I am able to establish the connection to Hive using DbViz and JDBC driver of Hive. I had gone through a lot of links and haven't found any good examples as well as recommendations to connect hive using python. Thank you in Advance, Subash
... View more
Labels:
11-14-2017
02:07 PM
Thank you @Parag Redij
... View more
10-30-2017
12:04 PM
Hey @Nixon Rodrigues, Hive hook generally captures change occurring in Hive but it doesn't capture changes when we run "select". Let's say I have a BI tools and one of the report run "select * " on some hive table every time every time we refresh or consider this example, I am fetching data in Spark from Hive using JDBC driver and again after doing some transformation in Spark on the RDD, we are writing transformed data back into Hive. Atlas Hive Hook doesn't have the capability to capture Spark changes. Let say if I have to build Spark Hook for Atlas, How I can write. I was giving an example of Apache Airflow because they have designed Hive Hooks in Python. And my whole Atlas REST API calls have been designed in python and that is why I want to build Hook in python. I am trying to build end to end lineage in Atlas and I want to capture all changes occurring in Hive. Changes can be : 1. Some BI tools fetching Data out of Hive 2. Apache Spark integration with Hive Let me know if we can use the existing Hive Hook and if we can how we can use with other services such as Spark, DbViz etc.. Thank you, Subash
... View more
10-30-2017
10:58 AM
Hi, To capture real-time events occurring in Hive, I am thinking of writing a Hive Hook. As I am not well versed in Java, Can I use python to build Hive Hook? If I can't. How to implement or customize Atlas Hive Hook written in Java. https://github.com/apache/atlas/blob/master/addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java Do Apache Airflow Hive Hook works in a similar fashion: https://pythonhosted.org/airflow/_modules/hive_hooks.html Thank you in Advance, Subash
... View more
Labels:
10-14-2017
02:14 PM
Thank you @Ashutosh Mestry.
... View more
10-12-2017
05:24 PM
Hi, I am trying to build a script to capture lineage (Atlas) when Data moves out of Hive. In order to capture the HQL statements executed in Hive( MR / Tez) by other applications, I am looking for some API endpoint to capture real time events occurring in hive. Thank you, Subash
... View more
Labels:
10-06-2017
09:05 AM
Hey @Shashant , You can try out ranger audit logs especially for Hive. Ranger captures all the column and table name. Few customization in code will help to get the full hive query. Subash
... View more
10-05-2017
01:17 PM
Which URL need to be passed to get_user_query(job_tracker_url) function.
... View more
08-22-2017
02:47 PM
Hi, I use to run this API command (Atlas 0.6) to fetch the schema details from the Atlas. My whole purpose was to identify the tagged and not tagged columns from the huge JSON output. In Atlas 0.6, For tags, One new key value pair use to get added for tagged columns, Key Name was '$traits$'.In Atlas 0.7, I am not getting any classifier for tagged and not tagged columns. Basically the Rest End point was mentioned below. atlas_hostname + '/api/atlas/discovery/search/dsl?query=hive_table+where+db.name%3D%22'+ database + '%22&limit=10000' Whole Code is here : atlas_hostname = 'hostname:21000'
database = 'mydb'
atlas_user_name = 'myuser'
atlas_password = 'mypassword'
headers = {'Accept': 'application/json, text/plain, */*','Content-Type': 'application/json; charset=UTF-8'}
def atlas_get_request(atlas_hostname,database, atlas_user_name,atlas_password):
try:
list_of_table_json=requests.get(atlas_hostname + '/api/atlas/discovery/search/dsl?query=hive_table+where+db.name%3D%22'+ database + '%22&limit=10000' , headers=headers, auth=(atlas_user_name,atlas_password))
dict_of_tables= list_of_table_json.json()
result=dict_of_tables['results']
return result
except requests.exceptions.HTTPError as err:
print (err)
return ("invalid Credentials or UI is unresponsive") I use to invoke the function atlas get request to fetch the schema details result = atlas_get_request(atlas_hostname,database, atlas_user_name,atlas_password) From the huge JSON response saved in "result", i used to filter the column key value pairs, Tagged column used to have an extra Key Value pair. Extra Key was "$traits$" in Atlas 0.6. Now in the column array of each and every column (in Atlas 0.7) i am getting only 9 key value pairs. No key value pairs which has the Tag details. Keys of the JSON dict mentioned below (tagged and untagged Column). [u'comment', u'$id$', u'qualifiedName', u'description', u'$typeName$', u'owner', u'table', u'type', u'name'] I am using apache atlas 0.7. Let me know how to overcome this issue. Thank you in advance, Subash
... View more
Labels:
08-10-2017
09:45 AM
Thank you @Bryan Bende, I will try Storm.
... View more
08-09-2017
01:05 PM
Hi, Can we merge 2 different JSON coming from 2 different flow files using any processor of NiFi. For example : Input 1 : {
"Table": "myTable",
"Database": "Database",
"Column": "Column1"
} Input 2 : {
"Table": "myTable",
"Database": "Database",
"Column": "Column2"
} Now the challenge is how we can merge above 2 JSON based on matching condition of Table and Database and merge content of both JSON in a one JSON dictionary. Output : {
"Table": "myTable",
"Database": "Database",
"Column": [
"Column2",
"Column1"
]
} Thank you in advance, Subash
... View more
Labels:
08-09-2017
10:38 AM
NiFi is cool. In my python script i was passing JSON as input and then i was doing transformation on JSON. After transformation, I used to invoke HTTP post request.I resolved my issue of JSON by using inbuilt processor of NiFi (EvaluateJsonPath -- > SplitJson --- > InvokeHTTP). 100 line of code get reduced only to 3 processors 🙂
... View more
08-07-2017
04:03 PM
Basically My JSON file contains schema info. For example, If a table has 300 columns, the JSON input has the attributes of column such as column name, datatypes e.t.c. Yes, the JSON input is coming as the content of flow file, from JSON i am extracting the JSON as an attribute and I do have changed the attribute length to 10 k but even then the JSON input is getting truncated.
... View more
08-07-2017
10:01 AM
Hi, I am calling a python script where i am passing a huge JSON as an input, Even though i have increased the length limit of attribute in ExecuteStreamCommand my JSON input is getting truncated. How to overcome this error. Thank you in advance, Subash
... View more
Labels:
07-25-2017
11:14 AM
As i have seen that we can import and export bulk policies in ranger, Can we do same sort of operation to delete policies as well using Rest End Point. Let say if i have thousand of masking policies in Apache Ranger and i want to delete all of them. How can this task be achieved. Thank you in Advance, Subash
... View more
Labels:
07-11-2017
10:01 AM
I am using HDP 2.6.1
... View more
07-11-2017
09:58 AM
@Sarath Subramanian, Yes i can see the "__Process.outputs" for the CTAS column. Attaching output below : {"requestId":"pool-2-thread-4 - 4718810e-6302-48cb-9b75-d6d665242731","query":"g.V(\"__guid\",\"90b6074b-532e-4562-8b37-42a8636926ac\").inE.toList()","queryType":"gremlin","results":[{"__modifiedBy":"admin","__state":"ACTIVE","__createdBy":"admin","inVertex":"163856480","__modificationTimestamp":"1499686956258","id":"2pkg7r-2pk0jc-opp1-2pk0gw","label":"__hive_column.table","outVertex":"163856568","__timestamp":"1499685457722"},{"__modifiedBy":"admin","__state":"ACTIVE","__createdBy":"admin","inVertex":"163856480","__modificationTimestamp":"1499686956258","id":"2pkw1e-2pk0pc-opp1-2pk0gw","label":"__hive_column.table","outVertex":"163856784","__timestamp":"1499685457722"},{"__modifiedBy":"admin","__state":"ACTIVE","__createdBy":"admin","inVertex":"163856480","__modificationTimestamp":"1499686956258","id":"2pl8oi-2pkdcg-osut-2pk0gw","label":"__hive_storagedesc.table","outVertex":"163873168","__timestamp":"1499685457722"},{"__modifiedBy":"admin","__state":"ACTIVE","__createdBy":"admin","inVertex":"163856480","__modificationTimestamp":"1499686956258","id":"2pmiia-2pkpzk-1a80l-2pk0gw","label":"__Process.outputs","outVertex":"163889552","__timestamp":"1499685462156"}],"count":4}
Let me know where i am going wrong.
... View more
07-10-2017
12:51 PM
Hey @Geoffrey Shelton Okot, Yes Hive Hook enabled, I can see the lineage of hive_table. Only the hive_column_lineage is not showing up.
... View more
07-10-2017
11:28 AM
I am trying to see the column level lineage in apache atlas 0.8. To get the lineage i have created a table by using CTAS statement over a subset of columns of a different table. As per this link, Ideally i should get the lineage.but in my Atlas, Lineage is not populating. Thank you, Subash
... View more
Labels:
07-05-2017
11:54 AM
Recently, I have installed HDP 2.6.1 on 3 node cluster to test new features of Apache Atlas and Ranger. When i ran import-hive.sh I am getting this issue. Exception in thread "main" org.apache.atlas.hook.AtlasHookException: HiveMetaStoreBridge.main() failed.
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:650)
Caused by: com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
Hbase, Kafka, Solr all services are running fine. After doing a lot of configuration changes in Atlas and other services, i haven't got any success.later on i tried request module of python. It seems that "get" requests are working well. "post" is giving issue. Few question I have: 1. I am able to create Tags and associate tag to Default database so ideally "admin" user has privilege to create delete and update entities. 2. Same should work for rest API, But why only "get" is working. Why not "post" Thank you, Subash
... View more
06-29-2017
09:04 AM
I have tried maybe 100 of times with some tweaks but every time i am facing the same error. u'Unable to deserialize json \\"{\\\\\\"structTypes\\\\\\": [], \\\\\\"traitTypes\\\\\\": [], \\\\\\"classTypes\\\\\\": [{\\\\\\"hierarchicalMetaTypeName\\\\\\": \\\\\\"org.apache.atlas.typesystem.types.ClassType\\\\\\", \\\\\\"superTypes\\\\\\": [], \\\\\\"attributeDefinitions\\\\\\": [{\\\\\\"name\\\\\\": \\\\\\"multiplicityRequired\\\\\\", \\\\\\"reverseAttributeName\\\\\\": \\\\\\"\\\\\\", \\\\\\"multiplicity\\\\\\": \\\\\\"required\\\\\\", \\\\\\"dataTypeName\\\\\\": \\\\\\"string\\\\\\", \\\\\\"isUnique\\\\\\": \\\\\\"true\\\\\\", \\\\\\"isIndexable\\\\\\": \\\\\\"true\\\\\\", \\\\\\"isComposite\\\\\\": \\\\\\"true\\\\\\"}], \\\\\\"typeDescription\\\\\\": \\\\\\"\\\\\\", \\\\\\"typeName\\\\\\": \\\\\\"test\\\\\\"}], \\\\\\"enumTypes\\\\\\": []}\\"'} I am following this post, here i guess post request is successful. Let me know where i am doing wrong. https://community.hortonworks.com/articles/124/atlas-api-tips-create-trait-type-example.html curl Command : curl -u atlas_user:password -H 'Content-Type: application/json; charset=UTF-8' -X POST -d '{"enumTypes": [],"structTypes": [],"traitTypes": [{"superTypes":[],"hierarchicalMetaTypeName":"org.apache.atlas.typesystem.types.TraitType","typeName": "EXPIRES_ON", "attributeDefinitions":[{ "name": "expiry_date", "dataTypeName": "date", "multiplicity": "required","isComposite": false,"isUnique": false,"isIndexable": true,"reverseAttributeName": null}]}],"classTypes": []}' http://hostname:21000/api/atlas/types
I am attaching my python code as well for reference. import requests
import json
hostname = 'hostname:21000'
headers = {'Content-Type': 'application/json; charset=UTF-8'}
atlas_user_name = 'username'
atlas_password = 'password'
create_type ={
"enumTypes": [],
"structTypes": [],
"traitTypes": [
{
"superTypes":[],
"hierarchicalMetaTypeName":
"org.apache.atlas.typesystem.types.TraitType",
"typeName": "EXPIRES_ON",
"attributeDefinitions": [
{
"name": "expiry_date",
"dataTypeName": "date",
"multiplicity": "required",
"isComposite": "false",
"isUnique": "false",
"isIndexable": "true",
"reverseAttributeName": ""
}
]
}
],
"classTypes": []
}
type_create_post = requests.post(hostname+ '/api/atlas/types' , headers=headers,data = json.dumps(create_type),auth =(atlas_user_name,atlas_password))
print type_create_post.json()
entity_create_post = requests.post(hostname+ '/api/atlas/types' , headers=headers,data = json.dumps(create_entity),auth =(atlas_user_name,atlas_password))
print entity_create_post.json()
... View more
Labels:
06-28-2017
01:28 PM
Hey Eyad, Ty. I will try to implement the same.
... View more
06-28-2017
11:24 AM
Hey @Smart Data, Ty can you please let me know how to create type and entity for Nifi Data flow
... View more