Member since
02-24-2018
66
Posts
1
Kudos Received
0
Solutions
08-03-2018
08:13 PM
1 Kudo
Entity: Representation of real-world element within Atlas.
Atlas will capture aspects of the element that will be relevant from
metadata perspective. Relationship: How entities are
related to each other. This relation enforces aspects like lifetime and
containment. Different types of relationships: Composition: If one is deleted, the other is deleted as well. E.g. Table and Columns. If table is deleted all the columns will be deleted too.
Aggregation:
If one is deleted other can continue to exist. E.g. Database and Table.
If a table within a database is deleted, database will continue to
exist. Relationships help sound modeling of data. Classification:
This is broad categorization of entities. Entities that are related
from a business perspective in some way are classified with same
classification. E.g. Sensitive information will reside in several tables
in several database in a data warehouse. A classification like
'Sensitive' can be applied to those tables.
Hope this helps!
... View more
06-19-2018
03:00 PM
@Felix Albani Thanks for quick response I need to delete around 100 records , Each one having unique identity number is their any way to delete in a single shot Since below process is taking more time for million of records curl 'http://localhost:8080/solr/<core-name>/update'-H
"Content-type: text/xml"--data-binary '<delete><query> dataid : 101</query></delete>' curl 'http://localhost:8080/solr/<core-name>/update'-H
"Content-type: text/xml"--data-binary '<delete><query> dataid : 1058262 </query></delete>' curl 'http://localhost:8080/solr/<core-name>/update'-H
"Content-type: text/xml"--data-binary '<delete><query> dataid : 74965103</query></delete>' curl 'http://localhost:8080/solr/<core-name>/update'-H
"Content-type: text/xml"--data-binary '<delete><query> dataid : 1895604 </query></delete>' curl 'http://localhost:8080/solr/<core-name>/update'-H
"Content-type: text/xml"--data-binary '<delete><query> dataid : 1023135</query></delete>' .... .... .... .... curl 'http://localhost:8080/solr/<core-name>/update'-H
"Content-type: text/xml"--data-binary '<delete><query> dataid : 18465498</query></delete>' curl 'http://localhost:8080/solr/<core-name>/update'-H
"Content-type: text/xml"--data-binary '<delete><query> dataid : 1878999</query></delete>' curl 'http://localhost:8080/solr/<core-name>/update'-H
"Content-type: text/xml"--data-binary '<delete><query> dataid : 222100</query></delete>'
... View more
06-14-2018
03:11 AM
@Satya Nittala As the data is not in the consistent format i have 2 step method to populate field values correctly. Step1: Create an temporary table to store this intermediate step data Split the data on = character and populate the data hive> select split('hdfs://Test/lob/ebia/publish/gss_attribute_details_pub/load_dt=20170303',"=")[0] Location,split('hdfs://Test/lob/ebia/publish/gss_attribute_details_pub/load_dt=20170303',"=")[1] partitionfields;
+-----------------------------------------------------------------+------------------+--+
| location | partitionfields |
+-----------------------------------------------------------------+------------------+--+
| hdfs://Test/lob/ebia/publish/gss_attribute_details_pub/load_dt | 20170303 |
+-----------------------------------------------------------------+------------------+--+
We are still missing load_dt in partitionfields column data Step2: select from temporary table data and insert into final table hive> select regexp_extract("hdfs://Test/lob/ebia/publish/gss_attribute_details_pub/load_dt","(.*)\\/",1) location,concat_ws("=",reverse(split(reverse('hdfs://Test/lob/ebia/publish/gss_attribute_details_pub/load_dt'), '/')[0]),"partitionfields") partitionfields;
+---------------------------------------------------------+--------------------------+--+
| location | partitionfields |
+---------------------------------------------------------+--------------------------+--+
| hdfs://Test/lob/ebia/publish/gss_attribute_details_pub | load_dt=partitionfields |
+---------------------------------------------------------+--------------------------+--+
In step2 first i'm extracting only data before last / i.e after / the data needs to go to partitionfields column, and by using concat_ws i'm joining partitionfields column data and using reverse function with split on / then extracting [0] position value, concatenating with partitionfields value. (or) By using two regular expression extracts to prepare the actual final data for the columns hive> select regexp_extract("hdfs://Test/lob/ebia/publish/gss_attribute_details_pub/load_dt","(.*)\\/",1) location,concat_ws("=",regexp_extract('hdfs://Test/lob/ebia/publish/gss_attribute_details_pub/load_dt', '(.*)\/(.*)',2),"partitionfields") partitionfields;
+---------------------------------------------------------+--------------------------+--+
| location | partitionfields |
+---------------------------------------------------------+--------------------------+--+
| hdfs://Test/lob/ebia/publish/gss_attribute_details_pub | load_dt=partitionfields |
+---------------------------------------------------------+--------------------------+--+ I hope this will work for your case correctly.
... View more
05-31-2018
04:04 AM
Please take a look at these models that we recently added.
... View more
03-15-2018
08:13 PM
2 Kudos
The taxonomy is still in tech preview - have you switched it on using the below link to the documentation. Do you see a 'Taxonomy' tab in the UI? https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_data-governance/content/atlas_enabling_taxonomy_technical_preview.html Just a word of caution - in a large production environment switching the taxonomy on can cause Atlas performance issues. We switched it off until it comes out of tech preview. Here's some info on how to query terms via curl. https://atlas.apache.org/0.7.1-incubating/AtlasTechnicalUserGuide.pdf I can't find any info about the terms in the latest rest api documentation.
POST
http://<atlasserverhost:port>/api/atlas/v1/taxonomies/Catalog/terms
/{term_name} I'm sure someone who knows more than I do will come along soon!
... View more
03-20-2018
10:38 AM
@Shu , Thank You...its working..
... View more
03-20-2018
07:28 AM
@Ashutosh Mestry , Its Syntax Mistake, Now working Fine..Thanks....!!!
... View more
03-16-2018
07:30 AM
@Laura Ngo , I tried with POST also , Same issue, Its not reflecting
... View more
03-07-2018
08:35 AM
@Satya Nittala 1. Please use correct port . By default , Atlas in non-SSL environment is configured to use 21000. 2. curl requires "@" for providing files. Example : -d @user/testplace.json 3. To update type , "PUT" (not POST. To create types use POST, to update types use "PUT") the JSON to http://atlashost:21000/api/atlas/v2/types/typedefs 4. As already mentioned , classification/tag best suits your requirement. Its highly recommended to use tags instead of updating types.
... View more
03-01-2018
12:08 PM
Thank You @Sharmadha Sainath , It is working Fine 🙂
... View more