Member since
10-06-2015
273
Posts
202
Kudos Received
81
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3369 | 10-11-2017 09:33 PM | |
2877 | 10-11-2017 07:46 PM | |
2150 | 08-04-2017 01:37 PM | |
1899 | 08-03-2017 03:36 PM | |
1802 | 08-03-2017 12:52 PM |
08-14-2017
02:46 PM
Thanks Matt, Interesting approach and makes a lot of sense to do things that way. I'll give it a try. Thanks for your help.
... View more
08-09-2017
02:04 PM
I have two files that get dropped into a folder. The first is a CSV file containing the data to be processed and landed in Hive. The second is an XML file that contains metadata about the CSV file. The metadata file contains information such as compression to be used (Snappy, etc..), HDFs storage format (AVRO, ORC, etc...), the table the data needs to be saved to, the different columns/schema in the CSV, as well as some other information. My question is what is the best strategy/way through Nifi to use this metadata file to process the CSV file and land the data in Hive? I've looked at using Schema Registry, but I believe that will only cover the columns mapping portion rather than the other info such as table name, storage format and compression.
... View more
Labels:
- Labels:
-
Apache NiFi
08-04-2017
01:37 PM
@Saurabh Currently this functionality does not exist with HDP 2.6.1. However, we are working on building it so expect to see it in a future release. The caveat is that some clients do not want to propagate tags automatically, so it will likely be an optionally enabled feature.
... View more
08-03-2017
03:36 PM
4 Kudos
@Marc Parmentier The date format is as follows: {yyyy}-{mm}-{dd}T{hh}:{mm}:{ss}.{zzz}Z {year}-{month}-{day}T{hours}:{minutes}:{seconds}.{timezone_id}Z e.g. 2017-04-18T18:49:44.000Z
... View more
08-03-2017
12:52 PM
1 Kudo
@chris herssens Streaming Analytics Manager is architected to be agnostic to the underlying streaming engine, and aims to support multiple streaming substrates such as Storm, Spark Streaming, Flink, etc. As part of the first release of SAM, Apache Storm is fully supported. Support for other streaming engines, including Spark Streaming, will be added in future releases. https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.0/bk_overview/content/ch_stream-analytics-overview.html https://www.slideshare.net/harshach/streaming-analytics-manager
... View more
08-02-2017
07:28 PM
1 Kudo
No, Atlas only works with Titan, not Janusgraph. So you cannot use DynamoDB as a datastore for Atlas.
... View more
08-02-2017
12:49 PM
1 Kudo
When you follow the links to the Github repo you'll see that AWS has built custom adapters to allow integration with JanusGraph not Titan. JanusGraph is a fork of the Titan project and has some differences.
... View more
08-01-2017
05:05 PM
How are you trying to use the metadata? In most of our implementations we use the Atlas REST API (http://atlas.apache.org/api/v2/index.html ) for metadata/lineage import and export. Have you considered using that? Please note that I have linked above to the new API, the legacy API ( http://atlas.apache.org/api/rest.html ) has been deprecated with HDP 2.6 (Atlas 0.8) and will be removed in a future version.
... View more
08-01-2017
05:00 PM
Please share a link to the source. There might be some confusion.
... View more
07-31-2017
06:55 PM
1 Kudo
@Al John Mangahas Distcp spins off MapReduce jobs on the cluster it is running on/from. You can use the Yarn UI on that cluster to monitor the job progress and utilization. Having said that, if you are copying from a Prod cluster to a DR cluster, and are worried about resource usage, then you can actually run the Distcp job on the DR cluster and have it "pull" the data from Prod. That way, the impact in terms of resources on Prod is minimal.
... View more