Member since
11-22-2016
83
Posts
23
Kudos Received
13
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1446 | 08-03-2018 08:13 PM | |
1234 | 06-02-2018 05:24 PM | |
603 | 05-31-2018 07:54 PM | |
1162 | 02-08-2018 12:38 AM | |
724 | 02-07-2018 11:38 PM |
01-14-2020
01:17 PM
Hi Sorry for the delay in replying to your message. You are right, Atlas 1.0 onwards we are using JanusGraph. Can you tell me on how to just upgrade the version, by using the latest download of Atlas 2.0? I am not sure what you are looking for. It would be good if you can elaborate on this. From our experience, upgrading JanusGraph version was as easy as updating the version in POM.xml and then running our test suite. There were few glitches we discovered in few versions. But it was something we could resolve quickly. Also, how to configure Cassandra as the DB for Atlas 2.0? I have not tried this personally. But my thinking is that the steps are similar to that of using HBase as backend. The configuration would be to ensure that JanusGraph gets the requisite properties for its initialization. Please see here for details. There was a community effort in the past to make Cassandra work with Atlas. Hope this helps. ~ ashutosh
... View more
01-06-2020
11:04 AM
Kindly look at this updated link: http://atlas.apache.org/#/Migration-0.8-to-1.0 Please let me know if you have more questions.
... View more
09-24-2018
04:28 PM
1 Kudo
@Maxim
Neaga
It is safe to ignore the error related to __AtlasUserProfile. Its a false positive.
... View more
08-03-2018
08:13 PM
1 Kudo
Entity: Representation of real-world element within Atlas.
Atlas will capture aspects of the element that will be relevant from
metadata perspective. Relationship: How entities are
related to each other. This relation enforces aspects like lifetime and
containment. Different types of relationships: Composition: If one is deleted, the other is deleted as well. E.g. Table and Columns. If table is deleted all the columns will be deleted too.
Aggregation:
If one is deleted other can continue to exist. E.g. Database and Table.
If a table within a database is deleted, database will continue to
exist. Relationships help sound modeling of data. Classification:
This is broad categorization of entities. Entities that are related
from a business perspective in some way are classified with same
classification. E.g. Sensitive information will reside in several tables
in several database in a data warehouse. A classification like
'Sensitive' can be applied to those tables.
Hope this helps!
... View more
07-02-2018
05:22 PM
Can you please post logs from /var/log/atlas/appplication.log, if this file is empty, please see contents of *.err, start-up errors because of resource constraints, they will be logged in the .err file.
... View more
06-19-2018
11:16 PM
Think of Atlas as a data repository that stores data in the form you specify. Given this, it is possible to define types using model JSONs or using Type REST calls. Thus, custom types are types you define that are different from what are available to you out of the box. To define your own times, you can use the existing models as guidance and proceed from there. The v2 API and data models have simplified the process. Hope this helps.
... View more
06-19-2018
11:11 PM
Please take a look at the Migrating data from Apache Atlas 0.8 to Apache Atlas 1.0. Please let us know if you have questions. Hope this helps.
... View more
06-14-2018
03:59 AM
The Taxonomy feature is not supported anymore. It has been replaced by Glossary that is part of the recently released v1.0. Please try that, I think you will find it useful.
... View more
06-02-2018
05:24 PM
Your question is valid. In most cases the ways of 'knowing' existence of entity are redundant. In general, GUIDs are assigned to entities when they are created, this remains unchanged through the lifetime of the entity. In case of entity creation via hooks: The incoming
entities will not have GUIDs as they are yet to be created, however
qualified name is available as it is required attribute for entities. This is used to detect existence of entities. Qualified name has potential for change in the this scenario: Entities are moved across cluster as part of synchronization using Export & Import APIs. The GUID main same but qualified name can change to reflect the correct location of the entity. E.g. Database entity within cluster with name cl1 will have entities whose name is db@cl1. Importing this entity into a cluster with name cl2 should change the qualified name to db@cl2. This way the imported entities reflect their new home, whereas with guid it is possible to know that the same entity exists across clusters. Please take a look here in transforms. Hope this helps.
... View more
05-31-2018
07:54 PM
Your observation is valid. The version number is something related to the model (schema, definition is present in addons/models) of the entity and not the version of the entity itself. Also, as of now, we don't have logic for dealing with version changes to schema. In short, version field is not of much consequence right now.
... View more
05-31-2018
04:04 AM
Please take a look at these models that we recently added.
... View more
05-31-2018
04:02 AM
Can you please tell me: What version of Atlas are you using? Is Atlas' HBase Hook installed? If yes, do you see any exceptions in logs?
... View more
04-17-2018
10:14 PM
Atlas uses TitanDB (JanusGraph in 1.0) as its underlying database. This supports many different back ends. Table names: Prior to v1, atlas_titan. For v1 it is atlas_janus. You are right in noting that in Hadoop ecosystem, HBase for data storage with Solr for index storage. HDFS hook is tricky from many aspects. Most importantly volume of data that can potentially be generated by HDFS hook. Note that Atlas does not store data. In case of HDFS hook, the meta information will be stored. Like directory/file name, size, creation date, so on. Please take a look at models defined here. Scalability is indeed something that needs to be addressed before this can be usable. I have few ideas on this, but don't have concrete implementation.
... View more
03-20-2018
08:19 PM
1 Kudo
@Satya Nittala To answer your question, you could use a query like: Department where __modificationTimestamp > "2018-03-19T00:00:00.00Z select __guid Where the date field is yesterday's date (today - 1). There isn't a keyword like today. Hope this helps.
... View more
03-18-2018
06:25 PM
Do you see any exceptions in the logs?
... View more
02-24-2018
05:52 PM
I understand your concerns. Right now there isn't a way to parallelize the NotificationHookConsumer. I have been experimenting with processing messages in parallel, so far I don't have a working solution. Sorry for not being able to help.
... View more
02-24-2018
05:43 PM
Right now there isn't a way. This could be addressed in couple of ways if you are willing to write some code:
Create a offline utility that would convert Excel (assuming it can be converted to CSV) to AtlasEntity format and generate a ZIP consumable by the Import process. Then perform an import. Use the Hive hook to convert it to AtlasEntity format and then publish it to the Atlas' Kafka topic (ATLAS_HOOK) that is consumed by Atlas. With the 2nd approach, once functional, is more seamless than the 1st one. Hope this helps.
... View more
02-20-2018
04:06 AM
Introduction This post enumerates the steps necessary to setup Atlas development environment using IntelliJ on Mac and Windows. This setup uses the BerkeleyDB for backend and embedded Solr as index engine. Setup with other backend and index engine variations are similar but will involve additional setup. Prerequisites These should be present on your machine before you begin:
Git for cloning repository. The command Git Shell is useful if you are switching between Mac and Windows. Maven for performing command line build. IntelliJ Community Edition or higher. BerkeleyDB as backend. Code Base Setup Download code base from GitHub location. Clone it under c:\work\Apache\atlas on Window and ~/Apache/atlas on Mac. Change directory to the location above and initiate a build (using mvn clean install package). Deploy Directory Setup Create a directory say Deploy (say c:\work on Windows or ~/work on Mac) with a structure below it:
conf
Copy atlas-application.properties, users-credentials.properties, policy-store.txt atlas-log4j.xml and atlas-env.sh here. Use contents of the attached ZIP. data
During runtime, the backend database will create its files here. This may be a location to check. data/solr Copy contents C:\work\atlas\repository\src\test\resources\solr to c:\deploy\data\ for Windows (from ~/Apache/atlas/repository/src/test/resources to ~/Deploy/data on Mac). libext
Copy BerkeleyDB JAR here, say je-5.0.73.jar logs
Logs will be created here. models
Copy contents from c:\work\atlas\addons\models (or ~/Apache/atlas/addons/models for Mac.) webapp (optional)
Deploy the contents of atlas.war here if you are developing on client-side (UI). bin (optional)
Empty for now. When done, your directory layout should look like this: WinUtils (for Windows only) Install WinUtils (link below). Copy WinUtils.exe from C:\Program Files (x86)\WinUtil\WinUtil.exe to C:\Users\ashut\.m2\repository\org\apache\hadoop\bin\WinUtils.exe IntelliJ: 'Atlas - Local' Configuration From IntelliJ's Run/Edit Configurations menu option, create a new configuration, call it 'Atlas - Local'. Details are:
Type: Application Main class: org.apache.atlas.Atlas VM options: These should reflect location of your directory created in the step above. Add the following:
-Datlas.home=C:\work\deploy\ -Datlas.conf=C:\work\deploy\conf -Datlas.data=C:\work\deploy\data -Datlas.log.dir=C:\work\deploy\logs -Dembedded.solr.directory=C:\work\deploy\data (See screen shot Profile-2) Program arguments: --port 31000
This is needed so that Atlas that is being run from IntelliJ does not clash with another version that runs on the development VM. Working directory: Set this to the location of webapp of your code base. (In my case, c:\work\apache\atlas on Windows and ~/Apache/atlas/ on Mac) Use classpath of module: atlas-webapp See screen shots below. Debug Run Within IntelliJ: Set the newly created configuration as active. From View/Tool Windows/Maven Project enable the Maven Projects side pane. From the Profiles select Berkeley-elasticsearch, graph-provider-default, graph-provider-janus. Use Run/Debug - 'Atlas - Local' from the menu. Check if server is up by accessing: http://localhost:31000/ Screen Shots Atlas - Local Profile:
Attachments conf-directory.zip: Contents of configuration directory. References How to install Maven on Windows WinUtil download. Credits
Thanks to Apoorv Naik (@anaik) for the investigation and the coming up with setup steps and helping me with the many setups.
... View more
- Find more articles tagged with:
- development
- How-ToTutorial
Labels:
02-09-2018
05:17 PM
Thanks for letting me know. I am glad your problem is solved.
... View more
02-08-2018
01:00 AM
The DSL end-points have not changed. I tried it on my local dev environment and I got correct results. I have 6 hive_db and 100+ hive_table. Is there anything I am missing?
... View more
02-08-2018
12:41 AM
Since, Atlas is a meta store, works off of the schema and not data. Hence, it is not possible to tag rows. Hope this helps!
... View more
02-08-2018
12:38 AM
1 Kudo
(@Sarath Subramanian Thanks for your help on this!) Version number is primarily used for patch updates to existing data types. On master, note the patches directory under addons/models/1000-Hadoop/patches AtlasTypeDefSoreInitializer.loadBootstrapTypeDefs (repository/store/bootstrap/AtlasTypeDefStoreInitializer.java) scans this directory and updates type definitions based on the version. In short, it is used for maintenance of types. Beyond this, version is NOT used internally for validation, etc.
... View more
02-08-2018
12:30 AM
Can you please try setting this in atlas-application.properties (via Ambari): atlas.use.index.query.to.find.entity.by.unique.attributes=true This has shown significant improvement in one of the environments. I would suggest trying this in pre-production environment first and verifying the results before updating production.
... View more
02-07-2018
11:43 PM
Atlas Taxonomy have known to have performance problems. For now, this has been disabled by default on branch-0.8 (see commit). While there is a plan to have this feature within re-introduced, there is no firm ETA on it.
... View more
02-07-2018
11:38 PM
Thank you for your patience. Attached is a sample from one of the internal environments. The attached zip (tag0.zip) has 2 files:
tag0.json: Top-level tag containing 1000+ sub-types. tag0_207.json: One of the child tags that do not have sub-types, but has tag0 as parent. You can use entity creation APIs to add these types to the Atlas server. Bulk APIs can be used to add multiple type definitions at the same time. Note that the bulk APIs takes AtlasTypesDef as input, ensure that the posted JSON is in that format. curl -X POST -u admin:admin -H 'Content-Type: application/json' -d @tags.json http://localhost:21000/api/atlas/v2/types/typedefs
Hope this helps.
... View more
02-07-2018
11:14 PM
By default user name and password are same (both are admin). I am not familiar with HDP on AWS. One other thing you could potentially try is to use CURL commands to delete the existing classifications. Retrieve classification definition first using: curl -X GET -u admin:admin -H 'Content-Type: application/json'
"http://localhost:21000/api/atlas/v2/types/classificationdef/name/PII" > pii.json
Now, this would work only if you don't have any entities associated with that classification. You will also need to massage the contents you got from the previous step. I have attached sample (pii-colljson.zip). curl -X DELETE -u admin:admin -H 'Content-Type: application/json' -d @pii-coll.json
http://localhost:21000/api/atlas/v2/types/typedefs
... View more
02-06-2018
05:11 PM
The existing quick start is unforgiving, in the sense that it stops (throws exception) if type being created by quick start already exists in the database. If don't have any data within the database, I would recommend truncating the database and then running quick start. Using HBase shell (hbase shell) you could use: truncate 'ATLAS_ENTITY_AUDIT_EVENTS' truncate 'atlas_titan' Hope this helps.
... View more
01-17-2018
05:04 PM
@Rajan Gupta As of now we don't have an API for accessing the web UI or components of it. Please let me know if you need help in accessing lineage information via REST APIs. These are available and used within the existing web UI. Hope this helps!
... View more
01-16-2018
05:41 AM
1 Kudo
Can you tell me exact error are you getting? To run the sample, unzip the contents of the attached zip. Here's a complete sample: curl -X POST -u admin:admin -H 'Content-Type: application/json' -d @type_def.json "http://localhost:21000/api/atlas/v2/types/typedefs"
curl -X POST -u admin:admin -H 'Content-Type: application/json' -d @entity_def.json "http://localhost:21000/api/atlas/v2/entity" Navigate to this URL to see the entity in the web UI: http://localhost:21000/#!/search/searchResult?type=test_type&searchType=dsl&dslChecked=true Use this CURL call to fetch the entity. Replace the guid below with the entity guid you see within the UI: curl -X GET -u admin:admin -H 'Content-Type: application/json' "http://localhost:21000/api/atlas/v2/entity/bulk/?guid=d7858640-f681-4ed9-a4b5-cb4abe680483" Hope this helps.
... View more
01-09-2018
05:34 PM
@Rajan Gupta As of now there isn't a way to export to CSV. However, we have this as a requirement to be addressed sometime this year. However it is possible to do an export that will be formatted as JSON which are in the AtlasEntity format. I can think of following ways to accomplish what you are trying to do: Use entity export and then post process the output. Export allows you to set starting entity. Use Search APIs, get hold of process entities and post process the output. Both of the approaches above need some effort in getting to CSV format. Hope this helps.
... View more