Member since
11-22-2016
83
Posts
23
Kudos Received
13
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1301 | 08-03-2018 08:13 PM | |
1065 | 06-02-2018 05:24 PM | |
496 | 05-31-2018 07:54 PM | |
1010 | 02-08-2018 12:38 AM | |
588 | 02-07-2018 11:38 PM |
01-07-2018
07:11 PM
Thank you on the insight on the permissions. The atlas_titan table is the repository for all of Atlas' data. All operations performed via the Atlas web server end up interacting with this table and the ATLAS_ENTITY_AUDIT_EVENTS table. Is there any specific concern around granting permissions? Hope this helps.
... View more
01-07-2018
07:03 PM
1 Kudo
@pbarna I worked on this as part of Atlas project. I realized that solution is not as simple given the numerous dependencies involved. Can you please tell me the version of Titan and Hadoop you are using? I attempted a similar exercise for using Titan 0.5.4 and Hadoop 2.6.3. My problem was to initiate Titan Index Repair job. This facility is built-in to Titan API. It uses MapReduce to initiate the repair. With some help, I realized that adding properties to yarn-site.xml and hbase-site.xml actually help. When you do update the properties in these files, be sure to use the <final>true</final> so that your settings override the default and take effect. Example: <property> <name>mapreduce.local.map.tasks.maximum</name> <value>10</value> <final>true</final> </property> Due to various reason I ended up writing a groovy script to achieve this. I can get into details if you are interested. My script is here. Please feel free to reach out if you think this was useful. Thanks @Nixon Rodrigues for letting me know about this question.
... View more
12-19-2017
05:09 PM
Yes it is possible to have Atlas without Ranger. I use this quite often on my local development environment.
... View more
12-04-2017
06:05 PM
@Arsalan Siddiqi Thanks the excellent question. Your observations are valid. While Atlas does help with meeting compliance requirements, it is only part of the solution. To use traffic analogy, Atlas is the map (hence the name) and does not deal the cars on the road (traffic). To complete the picture, there needs to be some monitoring on what data gets ingested in the system and if all the data conforms with the norms setup. Please take a look at this presentation from Data Summit 2017. It explains how a system can be setup which helps with governance (realm of Atlas) and then also helps with spotting errors within the data itself. To summarize, to be able to spot errors with flow of data itself, you would need some other mechanism. Atlas will not help you in that respect. About your 2nd question: Atlas consumes notifications from Kafka by spawning a single thread and processing 1 notification at a time (see NotificationHookConsumer.java & AtlasKafkaConsumer.java). In case of systems with high throughput, the notifications will be queued with Kafa and you will see a lag in consumption of notifications. Kafka guarantees durability of messages. Atlas ensures that it consumes every message produced by Kafka. If messages are dropped for some reason, you would see them in Atlas' logs. We also test Atlas in high available scenarios. Also, to address the notification message question, I would urge you to use Atlas V2 client APIs (both on master and branch-0.8). Kafka does not mandate any message format since all it understands is bytes, so that should not be a determining criteria for choosing the client API version. I know this is a lot of text, I hope it helps. Please feel free to reach out if you need clarifications.
... View more
12-01-2017
06:47 PM
@Sankaranarayanan S We committed a fix yesterday. That makes the steps above unnecessary. I would urge you to attempt a build again. With this fix, you can use the default graph (JanusGraph). mvn clean install -DskipTests -Pdist,embedded-hbase-solr Once build is done, I uncompress tar produced in distro directory and then run bin/atlas_start.py. The initial initialization takes about 5 mins to happen due to Solr initializing indexes. Subsequent starts are quicker. Hope this helps.
... View more
11-28-2017
05:19 PM
@Sankaranarayanan S I spent the day yesterday trying to address this. Eventually we were able to get to a point where this configuration works, only that it needs few manual steps. Below are contents of the script that we discussed on the mailing list. It would be great if you could try it and let us know if this works for you. One build parameter that we are currently using using is to use titan0 instead of JanusGraph. ATLAS_SOURCE_DIR=/tmp/atlas-source
ATLAS_HOME=/tmp/atlas-bin
# Clone Apache Atlas sources
mkdir -p ${ATLAS_SOURCE_DIR}
cd ${ATLAS_SOURCE_DIR}
git clone https://github.com/apache/atlas.git -b master
# build Apache Atlas
cd atlas
mvn clean -DskipTests -DGRAPH_PROVIDER=titan0 install -Pdist,embedded-hbase-solr,graph-provider-titan0
# Install Apache Atlas
mkdir -p ${ATLAS_HOME}
tar xfz distro/target/apache-atlas-*bin.tar.gz --strip-components 1 -C ${ATLAS_HOME}
# Setup environment and configuration
export MANAGE_LOCAL_HBASE=true
export MANAGE_LOCAL_SOLR=true
export PATH=${PATH}:/tmp/atlas-bin/bin
echo atlas.graphdb.backend=org.apache.atlas.repository.graphdb.titan0.Titan0GraphDatabase >> ${ATLAS_HOME}/conf/atlas-application.properties
# Start Apache Atlas
atlas_start.py
# Access Apache Atlas at http://localhost:21000
... View more
11-27-2017
05:30 PM
Can you please use branch-0.8 instead of master? What version of Maven are you using? What configuration are you attempting? (Since you are copying je*.jar, I assume it is BerkeleyDB and ES combination. Can you please confirm.) Please build using this command line: mvn clean package -Pdist,berkeley-elasticsearch. This should create a properties file with correct configuration. The attached properties file atlas-application-berkeleyproperties.zip. I have a deployment directory structure where I have the properties file, and the libext set. I use these command line arguments to pass that information to Atlas. -Datlas.home=./deploy/ -Datlas.conf=./deploy/conf -Datlas.data=./deploy/data -Datlas.log.dir=./deploy/logs
... View more
11-25-2017
09:47 PM
That's strange. Would it be possible to attach the log? I am using data from the sandbox VM and I am able to run the DSL queries just fine. I will get back to your question on basic query. I need to confirm a few things with someone from my team.
... View more
11-25-2017
09:43 PM
Thanks for reaching out. If you are using building master, can I suggest that you try branch-0.8. Master has been updated to work with next version of Hadoop. It should be more stable in the coming weeks. Also, it would be great if you can attach the entire log. We use these settings for Solr: atlas.graph.index.search.backend=solr5
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=localhost.localdomain:2181/infra-solr
If entire startup sequence runs correctly you should see this line in your log: 2017-10-30 17:14:04,173 INFO - [main:] ~ Started SelectChannelConnector@0.0.0.0:21000 (AbstractConnector:338) Hope this helps.
... View more
11-17-2017
07:57 PM
1 Kudo
@David Miller DSL query should help you with this. Each hive_table has hive_db name in its property. This DSL query should help you: hive_table where (db.name like '*_final' or db.name like '*_temp') About filtering deleted entities in DSL, there isn't a way to do it yet. We are in process of improving DSL. As for the documentation, I agree, it needs improvement. There is no firm ETA on this yet. Given the current state, my suggestion would be to use basic query as much as possible. Hope this helps.
... View more
11-15-2017
05:11 PM
@Saurabh Failed messages are accumulated if Atlas is not able to consume messages from Kafka. These messages are generated by plugins like Hive who generate notification messages to be about changes taken place within it. These are placed on Kafka topic, that are eventually consumed by Atlas. To get to the root of the problem, it would help if you could tell us more about: Scenario that you are trying out? What functionality is impacted adversely because of this? Environment details. Would it be possible to attach portion of the logs?
... View more
11-15-2017
04:54 PM
@Amanda Hua Thanks for opening the ticket. Based on my research the errors in logs seem to be a known issue with embedded mode. Please take a look at this. As resolution, i would suggest following options: Using external HBase and Solr5. If you are looking to use Atlas for experimentation, try using BerkeleyDB and Elastic search combination. Use the attached properties file (rename it to atlas-application.properties and drop it in the conf directory). Also try downloading HDP sandbox if you wish to investigate features of Atlas and test out the integrations with Ranger and Hive atlas-application-berkeleyproperties.zip.
... View more
11-13-2017
11:50 PM
@Arsalan Siddiqi Your observations are accurate. In fact there is one initiative in progress that is to address this. Only that I don't have an ETA on when it will get done.
... View more
11-06-2017
06:16 PM
@Arsalan Siddiqi Can you please let us know roughly how much data you have?
... View more
11-02-2017
12:52 AM
1 Kudo
Can you tell if you see this line in the logs? jetty-8.1.19.v20160209 (Server:272) From the log it appears that the classpath of the old version and the new version may be getting mixed up. This is causing Jetty not to serve up the pages correctly. Is it possible to try this: Stop Atlas via Ambari. Ensure that /usr/hdp/current/atlas-server/server/webapp/atlas.war exists. Remove this directory /usr/hdp/current/atlas-server/server/webapp/atlas Start Atlas via Ambar
... View more
10-31-2017
07:56 PM
Attached is a small sample of lineage in action. Use the following command to import the attached ZIP. You should see lineage for the only table present in the database. curl -g -X POST -u admin:admin -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F data=@./Stocks-2.zip "http://localhost:21000/api/atlas/admin/import" Please extract the ZIP to see the entity definition for hive_process. stocks-2.zip
... View more
10-31-2017
06:48 PM
You seem to be using v1 APIs. What version of Atlas are you working on? Please look at this link for V2 API usage. This has some attached JSONs and CURL calls that help with table creation. Hope this helps.
... View more
10-25-2017
04:17 AM
Unfortunately, I don't think we have one place with all the troubleshooting tips. Atlas & Ranger use the Kafka to integrate. I would look at Kafka queue to see if messages are getting published by Atlas. Also, check Ranger logs to see if you see any errors. Have you looked at this tutorial? It will help if you could describe your environment.
... View more
10-23-2017
02:30 AM
@Arsalan Siddiqi I agree with @Vadim Vaks 's suggestion on structuring the types. I will attempt to try out your JSON's today and try to see if I can get a better insight on the lineage behavior with the setup.
... View more
10-13-2017
04:28 PM
@subash sharma Have taken a look at the Hive Hook code? This should give you a good idea of the approach used. I don't know if that will be sufficient for all your needs, but it should give you a good idea on how to approach the problem. Hope this helps.
... View more
10-09-2017
04:42 PM
If you could let us know these details, it will help in getting towards a solution: What are the deployment environment details? How was the build done? Information about Maven parameters will help. Is it SSL or non-SSL environment? Is Kerboros used? When atlas_start.py is executed, do you see new process getting created? If yes, what are the classpath details?
... View more
10-05-2017
05:49 PM
That is correct. The ZIP file has the output and the TXT is the input.
... View more
10-04-2017
08:38 PM
I copied the JSON contents from the create-hive-table-entity.txt to a file entity-create.json. Now I had to make a change to it so that the hive_db guid present in it had to be replaced with a guid of an already existing database. I replaced 90a7d3af-873a-4c10-a815-069f2d47d490 with 53ce4850-803e-457f-9f41-dfd01a761d9c. I used this CURL command: curl -k -X POST -u admin:admin -H "Content-Type: application/json" -H "Cache-Control: no-cache" "https://localhost:21443/api/atlas/v2/entity" -d @../docs/entity-create.json I was able to create the entities. Note that my instance of Atlas has SSL enabled. Can you please attempt this and let me know the results.
... View more
10-04-2017
08:18 PM
Personally, I have not tried this but it is possible. AtlasByteType exists.
... View more
10-03-2017
10:05 PM
Can you please let me know the configuration of your cluster? How was the build deployed? Was it something you built and then copied over or was it pre-built?
... View more
10-03-2017
08:51 PM
Let me attempt to answer all your questions: 1. Think of composite as composition in object oriented design world. An entity is part of the dependent entity. The lifetime of the child (contained entity) is determined by its parent. Child cannot exist without the parent. 2. I don't think there is a place where all of this is defined. We should definitely improve on our documentation. Here's link to code where this is defined. It is not ideal but better than nothing. 3. Relationships is a new concept. This implementation is only is master now. There is backward compatibility. Here's what the concept is: So far (before relationships) we would define associations between 2 entities as containment just by referencing one entity into another. See hive model in branch 0.8. Relationships merely allow you to capture this more comprehensively by modeling it separately. This will be apparent if you simply compare the hive model in master and branch-0.8. 4. For defining entities with composition, you could useAtlasEntity.AtlasEntityWithExtInfo. Add parent entity to the constructor and define the entities referenced as AtlasObjectIds. The creation API will take care of resolving the references. I am attaching JSON for reference. CURL entity creation is also added for reference. Also see these object diagrams. Hope this helps. @Sarath Subramanian Thanks for your help in drafting this reply. hive-table-sample.zipcreate-hive-table-entity.txt
... View more
09-29-2017
02:27 PM
@Arsalan Siddiqi Your observation about the JSON is accurate. The JSON you see in the sample is represented in the old format. We now use the new format referred to as V2. The V2 format is easy to understand as it is a JSON representation of the Java class. This is much easy to code compared to earlier approach. I am attaching atlas-application.properties file that I use on IntelliJ development. atlas-applicationproperties.zip Hope this helps.
... View more
09-27-2017
10:33 PM
@Arsalan Siddiqi Thanks for reaching out. If you could clarify few items below, it will help: What is the purpose of hook? Hook are one of the ways to get data into Atlas. They are used in cases where the producer of data has well defined mechanism of sending notifications about their data. Atlas leverages that. In your case, does your producer have this in place? If your producer does not have good notification mechanism in place, you could consider writing a small application that would enumerate the data and then use Atlas' REST APIs to update data to Atlas. We use IntelliJ for development. There are few setup steps needed if you need to use integrated debugging via IntelliJ. Let me know if that is the case. Attached are logs for a successful Atlas startup. applicationlog.zip
... View more
08-15-2017
05:40 PM
Is there a specific exception that is logged over and over again? If yes, can you please let me know what the exception is?
... View more
07-28-2017
02:26 PM
@Arsalan Siddiqi From the exceptions it appears that the Atlas is having difficulty connecting to Kafka. From the web UI exception, it appears to be a build problem. Can you please confirm which profile have you used for build is it berlekeyDB & elasticSearch or is it hbase & solr.
... View more
- « Previous
-
- 1
- 2
- Next »