Member since
10-06-2015
273
Posts
202
Kudos Received
81
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1769 | 10-11-2017 09:33 PM | |
1506 | 10-11-2017 07:46 PM | |
1190 | 08-04-2017 01:37 PM | |
1184 | 08-03-2017 03:36 PM | |
861 | 08-03-2017 12:52 PM |
04-16-2018
07:19 PM
@Antony Shajin Lucas Yes, this is the expected behavior. Tags are independent of entities. You can create a tag and apply it to multiple entities. If an entity is removed the tag will still remain.
... View more
04-02-2018
02:06 PM
Are you only looking to transfer tags or everything (entities, tags, lineage)?
... View more
04-02-2018
01:56 PM
@robert cheung Deny policies take precedence over allow policies. So, in your scenario above, User 1 should not have access to data tagged as both, "International" and "PII". Take a look at the flow chart below on the sequence of policy evaluation.
... View more
01-02-2018
07:39 PM
1 Kudo
@vikas aggarwal Yes, the community is working on a replacement Taxonomy/Business Catalog feature. However, we do not have a release timeframe yet.
... View more
12-29-2017
02:02 PM
@Alp Alp This is currently not possible. Atlas-Hive tracking is all or nothing. What is the business driver behind this requirement? If you can explain the business need then I'll be able to raise it with our product managers and the community for consideration in future releases.
... View more
10-13-2017
01:02 PM
1 Kudo
No, tags are associated to entities. If you want something associated with a type then just add an attribute to it.
... View more
10-13-2017
09:57 AM
You can create a Tag that accepts attributes. For example, an "Expiry_Date" tag that contains an attribute named "date". You can associate that tag directly to your table or column. At the time of association, the user will be asked to fill out the "date" value for that particular entity/instance. So, yes, this could be a workaround.
... View more
10-13-2017
09:13 AM
I'm afraid not at this point. The community has stopped development on this code-base and is looking at alternate paths. I'll update this post once I have more information.
... View more
10-12-2017
02:05 PM
2 Kudos
@Zakir Hossain The way to approach this isn't through Atlas, but rather through the way you ingest the data into Hive. Rather than dropping/deleting and recreating the Hive table you can either use "Truncate" or "Insert Overwrite" to replace the data only. This way, your metadata and tags stay intact and only your data is refreshed. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-TruncateTable https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries *Keep in mind that if you have ACID enabled then you will not be able to use "Insert Overwrite", and your only option is "Truncate" As of Hive 0.14, if a table has an OutputFormat that implements AcidOutputFormat and the system is configured to use a transaction manager that implements ACID, then INSERT OVERWRITE will be disabled for that table. This is to avoid users unintentionally overwriting transaction history. The same functionality can be achieved by using TRUNCATE TABLE (for non-partitioned tables) or DROP PARTITION followed by INSERT INTO.
... View more
10-11-2017
09:33 PM
1 Kudo
@Héctor Pérez Arteaga Are you referring to the "Business Taxonomy" feature (https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_data-governance/content/ch_managing_atlas_business_taxonomy.html)? If so, this is currently in tech preview feature and not recommended for production. Business Taxonomy, as it is, will not be developed further and will be replaced in a future version. I would advise you not use it if possible. Having said that, to answer your question; there is no API call to import Taxonomies in bulk. You would have to use the API to add each individual term. The best path is to write a script/program that calls the API to do the bulk inserts in a loop.
... View more
10-11-2017
07:46 PM
2 Kudos
@Héctor Pérez Arteaga First let's cover Entities vs Types. Think of this the same way as the relation between Object and Class in Object-Oriented Program. The "Entity" (your table) is an instance of "Type" (hive_table). As such, an Entity can only contain attributes defined in the "Type". You can add metadata/attributes to an Entity (e.g. my_demo_table) using the REST API. However, you must first update the "Type" (e.g. hive_table) first with that particular attribute. So, for example, if you would like to add a new attribute called "Department" to a particular table, then you would have to add that attribute first in "hive_table" Type. After that you would be able to set the attribute value in your Entity "my_demo_table".
... View more
09-28-2017
11:29 AM
@Riddhi Sam For an explanation of the evolution of Hadoop 1 to Hadoop 2 take a look at this article/blog-post: http://www.tomsitpro.com/articles/hadoop-2-vs-1,2-718.html For an explanation of what to expect with Hadoop 3 read the below: https://hortonworks.com/blog/data-lake-3-0-deploy-minutes-cut-tco-half/ https://hortonworks.com/blog/data-lake-3-0-part-2-multi-colored-yarn/ Also, take a look at the below presentation from Dataworks Summit that covers YARN past, present and future: https://www.youtube.com/watch?v=PRsr1hgidQI&index=137&list=PLQ-KRsI-e9bBoDjQV_pe7L9171r2Sddsr
... View more
09-11-2017
06:13 PM
duplicated here: https://community.hortonworks.com/questions/115277/hdp-sandbox-not-able-to-connect-1270018888-or-1270-1.html
... View more
09-11-2017
06:12 PM
duplicated here: https://community.hortonworks.com/questions/115277/hdp-sandbox-not-able-to-connect-1270018888-or-1270-1.html
... View more
08-14-2017
02:46 PM
Thanks Matt, Interesting approach and makes a lot of sense to do things that way. I'll give it a try. Thanks for your help.
... View more
08-09-2017
02:04 PM
I have two files that get dropped into a folder. The first is a CSV file containing the data to be processed and landed in Hive. The second is an XML file that contains metadata about the CSV file. The metadata file contains information such as compression to be used (Snappy, etc..), HDFs storage format (AVRO, ORC, etc...), the table the data needs to be saved to, the different columns/schema in the CSV, as well as some other information. My question is what is the best strategy/way through Nifi to use this metadata file to process the CSV file and land the data in Hive? I've looked at using Schema Registry, but I believe that will only cover the columns mapping portion rather than the other info such as table name, storage format and compression.
... View more
Labels:
- Labels:
-
Apache NiFi
08-08-2017
12:48 PM
@heta desai One option is to use Nifi/HDF to get the tweets from twitter and then post them to Kafka. Take a look at the demo in the link below to give you an idea ho that might work. The destination will be Kafka though rather than HDFS or Solr (which is what the example uses). https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.html Alternatively, you can write a Java producer using the Twitter API to get the tweets and put them in Kafka as per the examples below. https://acadgild.com/blog/streaming-twitter-data-using-kafka/ https://www.tutorialspoint.com/apache_kafka/apache_kafka_real_time_application.htm
... View more
08-04-2017
01:37 PM
@Saurabh Currently this functionality does not exist with HDP 2.6.1. However, we are working on building it so expect to see it in a future release. The caveat is that some clients do not want to propagate tags automatically, so it will likely be an optionally enabled feature.
... View more
08-03-2017
03:36 PM
4 Kudos
@Marc Parmentier The date format is as follows: {yyyy}-{mm}-{dd}T{hh}:{mm}:{ss}.{zzz}Z {year}-{month}-{day}T{hours}:{minutes}:{seconds}.{timezone_id}Z e.g. 2017-04-18T18:49:44.000Z
... View more
08-03-2017
12:52 PM
1 Kudo
@chris herssens Streaming Analytics Manager is architected to be agnostic to the underlying streaming engine, and aims to support multiple streaming substrates such as Storm, Spark Streaming, Flink, etc. As part of the first release of SAM, Apache Storm is fully supported. Support for other streaming engines, including Spark Streaming, will be added in future releases. https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.0/bk_overview/content/ch_stream-analytics-overview.html https://www.slideshare.net/harshach/streaming-analytics-manager
... View more
08-02-2017
07:28 PM
1 Kudo
No, Atlas only works with Titan, not Janusgraph. So you cannot use DynamoDB as a datastore for Atlas.
... View more
08-02-2017
12:49 PM
1 Kudo
When you follow the links to the Github repo you'll see that AWS has built custom adapters to allow integration with JanusGraph not Titan. JanusGraph is a fork of the Titan project and has some differences.
... View more
08-01-2017
05:05 PM
How are you trying to use the metadata? In most of our implementations we use the Atlas REST API (http://atlas.apache.org/api/v2/index.html ) for metadata/lineage import and export. Have you considered using that? Please note that I have linked above to the new API, the legacy API ( http://atlas.apache.org/api/rest.html ) has been deprecated with HDP 2.6 (Atlas 0.8) and will be removed in a future version.
... View more
08-01-2017
05:00 PM
Please share a link to the source. There might be some confusion.
... View more
07-31-2017
07:28 PM
1 Kudo
@Doug Lennings You can use the ExecuteSQL processor in Nifi and configure it to use Impala's JDBC.
... View more
07-31-2017
07:14 PM
@rohith ak HDPCD Spark Certification uses the following versions: HDP 2.4.0 Spark 1.6 Scala 2.10.5 Python 2.7.6 (pyspark) Details in link below: https://2xbbhjxc6wk3v21p62t8n4d4-wpengine.netdna-ssl.com/wp-content/uploads/2017/05/HDCD_Spark_Data_Sheet.pdf While it's ok to practice on a newer version, just keep in mind that there's functionality in 2.x that is not available in 1.6. I strongly suggest that you study and practice on HDP 2.4 (Spark 1.6) to avoid confusing yourself. You can download the HDP 2.4 sandbox that contains Spark 1.6 from the link below. https://hortonworks.com/downloads/#sandbox Scroll down to where it says "Hortonworks Data Platform Archive" and click "Expand" to get the archived versions and the download link for HDP 2.4.
... View more
07-31-2017
06:55 PM
1 Kudo
@Al John Mangahas Distcp spins off MapReduce jobs on the cluster it is running on/from. You can use the Yarn UI on that cluster to monitor the job progress and utilization. Having said that, if you are copying from a Prod cluster to a DR cluster, and are worried about resource usage, then you can actually run the Distcp job on the DR cluster and have it "pull" the data from Prod. That way, the impact in terms of resources on Prod is minimal.
... View more
07-31-2017
04:20 PM
2 Kudos
@Muhammad Imran Tariq No, Atlas requires Titan Graph Database which only supports BerkeleyDB, HBase and Cassandra as storage backends. http://atlas.apache.org/Architecture.html http://titan.thinkaurelius.com/
... View more
07-24-2017
02:42 PM
For a comparison between compression formats take a look at this link: http://comphadoop.weebly.com/
... View more