Member since
10-17-2016
93
Posts
10
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4765 | 09-28-2017 04:38 PM | |
7163 | 08-24-2017 06:12 PM | |
1845 | 07-03-2017 12:20 PM |
04-04-2024
10:47 PM
1 Kudo
Hello, Is there any resolution for this error, I am facing same issue with Atlas installation.
... View more
10-07-2022
08:42 PM
Hi, I have been following your instruction. If I want to do the same thing, but with pyspark, will the code similar to this?
... View more
12-29-2019
08:42 PM
@ask_bill_brooks , Sorry, i am not seeing the accept as solution option in my screen. Thanks.
... View more
12-04-2017
06:05 PM
@Arsalan Siddiqi Thanks the excellent question. Your observations are valid. While Atlas does help with meeting compliance requirements, it is only part of the solution. To use traffic analogy, Atlas is the map (hence the name) and does not deal the cars on the road (traffic). To complete the picture, there needs to be some monitoring on what data gets ingested in the system and if all the data conforms with the norms setup. Please take a look at this presentation from Data Summit 2017. It explains how a system can be setup which helps with governance (realm of Atlas) and then also helps with spotting errors within the data itself. To summarize, to be able to spot errors with flow of data itself, you would need some other mechanism. Atlas will not help you in that respect. About your 2nd question: Atlas consumes notifications from Kafka by spawning a single thread and processing 1 notification at a time (see NotificationHookConsumer.java & AtlasKafkaConsumer.java). In case of systems with high throughput, the notifications will be queued with Kafa and you will see a lag in consumption of notifications. Kafka guarantees durability of messages. Atlas ensures that it consumes every message produced by Kafka. If messages are dropped for some reason, you would see them in Atlas' logs. We also test Atlas in high available scenarios. Also, to address the notification message question, I would urge you to use Atlas V2 client APIs (both on master and branch-0.8). Kafka does not mandate any message format since all it understands is bytes, so that should not be a determining criteria for choosing the client API version. I know this is a lot of text, I hope it helps. Please feel free to reach out if you need clarifications.
... View more
11-13-2017
11:50 PM
@Arsalan Siddiqi Your observations are accurate. In fact there is one initiative in progress that is to address this. Only that I don't have an ETA on when it will get done.
... View more
10-23-2017
02:30 AM
@Arsalan Siddiqi I agree with @Vadim Vaks 's suggestion on structuring the types. I will attempt to try out your JSON's today and try to see if I can get a better insight on the lineage behavior with the setup.
... View more
10-05-2017
05:49 PM
That is correct. The ZIP file has the output and the TXT is the input.
... View more
09-29-2017
02:27 PM
@Arsalan Siddiqi Your observation about the JSON is accurate. The JSON you see in the sample is represented in the old format. We now use the new format referred to as V2. The V2 format is easy to understand as it is a JSON representation of the Java class. This is much easy to code compared to earlier approach. I am attaching atlas-application.properties file that I use on IntelliJ development. atlas-applicationproperties.zip Hope this helps.
... View more
04-27-2018
04:45 PM
I found this problem was caused by missing all of the JSON so I had a missing { but still keen to understand the problem solving process.
... View more
08-24-2017
06:12 PM
Ok here are all the steps required to run Apache Atlas natively with Berkeley DB and Elastic: Download and install Kafka use the link : https://kafka.apache.org/downloads. Download the binary and extract to your required location. Kafka and Atlas would also require Zookeeper. By default kafka comes with an instance of zookeeper. If you do not have zookeeper running or installed, you can use this. Navigate to and run : kafkahome/bin/zookeeper-server-start.sh Once zookeeper has started you can check it using the command: netstat -ant | grep :2181. if everything is fine you should see: tcp6 0 0 :::2181 :::* LISTEN Now you can start your kafka server using the command: ./kafkaHOME/bin/kafka-server-start.sh /KafkaHome/config/server.properties To check if kafka is running run the command netstat -ant | grep :9092. You should see a similar result as mentioned above. Now you are ready to move on with ATLAS. You can either use the link provided on the website or do a branch and tag checkout directly from github. I used the command on their website: git clone https://git-wip-us.apache.org/repos/asf/atlas.git atlas navigate into the folder : cd atlas Create new folder called libext using: mkdir libext You need to download the jar file form this URL. http://download.oracle.com/otn/berkeley-db/je-5.0.73.zip You will need an oracle account. Create one to download the zip file. Copy this zip file into your libext folder that you just created. run command export MAVEN_OPTS="-Xmx1536m -XX:MaxPermSize=512m" run command mvn clean install -DskipTests (MAKE SURE TO USE SKIP TESTS ) run command: mvn clean package -DskipTests -Pdist,berkeley-elasticsearch Navigate to the following location: incubator-atlas/distro/target/apache-atlas-0.8-incubating-bin/apache-atlas-0.8-incubating/bin/atlas_start.py OR /home/arsalan/Development/atlas/distro/target/apache-atlas-0.9-SNAPSHOT-bin/apache-atlas-0.9-SNAPSHOT Depending on which repo you have used. Run the follwoing command python atlas_start.py You can now navigate to localhost:21000 to check Atlas GUI. Hope it helps!!!!!
... View more