Member since
10-06-2015
273
Posts
202
Kudos Received
81
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3128 | 10-11-2017 09:33 PM | |
2662 | 10-11-2017 07:46 PM | |
2025 | 08-04-2017 01:37 PM | |
1785 | 08-03-2017 03:36 PM | |
1679 | 08-03-2017 12:52 PM |
07-31-2017
04:20 PM
2 Kudos
@Muhammad Imran Tariq No, Atlas requires Titan Graph Database which only supports BerkeleyDB, HBase and Cassandra as storage backends. http://atlas.apache.org/Architecture.html http://titan.thinkaurelius.com/
... View more
07-24-2017
02:42 PM
For a comparison between compression formats take a look at this link: http://comphadoop.weebly.com/
... View more
07-19-2017
06:16 PM
@Varun R Take a look at the below articles that cover Tez performance tuning as well as an overview of how it works. https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.html https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html
... View more
07-14-2017
06:43 PM
You can
query using the API for entities created/modified after a certain date. You would do this by running a DSL query
against the “createTime” attribute via rest. For
example, if you would like to query for hive tables created/modified after 2017-04-18
6:49pm Your rest call would look like: http://localhost:21000/api/atlas/v2/search/dsl?query=createTime%3E'2017-04-18T18%3A49%3A44.000Z'&typeName=hive_table The date
format is as follows: {yyyy}-{mm}-{dd}T{hh}:{mm}:{ss}.{zzz}Z {year}-{month}-{day}T{hours}:{minutes}:{seconds}.{timezone}Z e.g. 2017-04-18T18:49:44.000Z You can also use a subset of the date rather than the entire string. For example you can query by year only (2017), full date only (2017-04-18), date and time only (2017-04-18T18:49:44)
... View more
07-14-2017
06:04 PM
Im using HDP 2.6 and I'm trying to use DSL to query Atlas for tables created after a certain date. So far, I've tried querying based on the attribute "createTime" but am unable to figure out the date format used (milliseconds, seconds, etc...). It seems that it only takes year (>2017, <2018) but nothing more. Does anyone have any idea how to query for tables created/modified after a certain date/time?
... View more
Labels:
- Labels:
-
Apache Atlas
07-10-2017
02:39 PM
1 Kudo
@subash sharma Which version of HDP are you using? Even though Column Level Lineage is advertised as available with Atlas 0.8 in the Apache page, it has only been made GA with HDP 2.6.1 rather than HDP 2.6.0. The delay was to ensure it works with the other appropriate HiveQL commands beyond CTAS.
... View more
06-29-2017
05:33 PM
Adding to Sonu's response: Moving to Hadoop from BI/EDW background is certainly a very common path. Those coming from that background usually find themselves more comfortable with Hive as an entry point. Hive provides an abstraction layer on top of MapReduce/Tez, and is based on SQL-like syntax that is ANSI compliant. It also has the advantage of providing a JDBC/ODBC connector, so most of the industry BI tools such as Tableau, Qlik, Microstrategy, etc.. can integrate and interact with Hive. This means that business analysts may continue to use the tools they are already familiar with while leveraging the power of Hadoop in the background I would recommend you start by looking at Hive. Once comfortable with it, you can start to explore Hive data modelling and optimization, and then branch out to the other areas that Sonu recommended. I've also seen people in the field focus their entire career/job around just Hive. Take a look at the link below for an introduction to Hive. There's plenty of internet resources and books that you can leverage to advance your knowledge. Hortonworks also provides Developer Training that covers an introduction to Hive as well as other engines/tools. https://hortonworks.com/tutorial/how-to-process-data-with-apache-hive/
... View more
06-28-2017
01:10 PM
1 Kudo
@Smart Data I am assuming that you are trying to create entities and lineage for HDFS files. If so, then yes, you would need to use the REST API to create the lineage. You can use the API to create the entities themselves rather than going through Kafka. If you're using HDP 2.6.1, you can also create your entities through the Atlas UI as per the link below. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_data-governance/content/ch_atlas_searching_and_viewing_entities.html#atlas_manually_creating_entities Finally, below is a step-by-step example of creating entities and lineage for an HDFS file that is picked and processed by Spark and the results written back to HDFS. It will give you a good idea of how the APIs may be leveraged. https://community.hortonworks.com/content/kbentry/91237/creating-custom-types-and-entities-in-atlas.html
... View more
06-28-2017
01:06 PM
@subash sharma As you are already aware, currently there is no out-of-box integration between HDF/Nifi and Atlas. This is a roadmap item and has bee n documented in the below: https://issues.apache.org/jira/browse/NIFI-3709 However, you can use the Atlas REST API to create the HDF entities and lineage. Below are a couple of examples that show how this may be done: https://community.hortonworks.com/repos/66014/nifi-atlas-bridge.html https://community.hortonworks.com/repos/39432/nifi-atlas-lineage-reporter.html As always, if you find this post helpful, don't forget to "accept" answer.
... View more