About egarelnabi

egarelnabi · ‎08-14-2017

Thanks Matt, Interesting approach and makes a lot of sense to do things that way. I'll give it a try. Thanks for your help.

egarelnabi · ‎08-09-2017

I have two files that get dropped into a folder. The first is a CSV file containing the data to be processed and landed in Hive. The second is an XML file that contains metadata about the CSV file. The metadata file contains information such as compression to be used (Snappy, etc..), HDFs storage format (AVRO, ORC, etc...), the table the data needs to be saved to, the different columns/schema in the CSV, as well as some other information. My question is what is the best strategy/way through Nifi to use this metadata file to process the CSV file and land the data in Hive? I've looked at using Schema Registry, but I believe that will only cover the columns mapping portion rather than the other info such as table name, storage format and compression.

egarelnabi · ‎08-04-2017

@Saurabh Currently this functionality does not exist with HDP 2.6.1. However, we are working on building it so expect to see it in a future release. The caveat is that some clients do not want to propagate tags automatically, so it will likely be an optionally enabled feature.

egarelnabi · ‎08-03-2017

@Marc Parmentier The date format is as follows: {yyyy}-{mm}-{dd}T{hh}:{mm}:{ss}.{zzz}Z {year}-{month}-{day}T{hours}:{minutes}:{seconds}.{timezone_id}Z e.g. 2017-04-18T18:49:44.000Z

egarelnabi · ‎08-03-2017

@chris herssens Streaming Analytics Manager is architected to be agnostic to the underlying streaming engine, and aims to support multiple streaming substrates such as Storm, Spark Streaming, Flink, etc. As part of the first release of SAM, Apache Storm is fully supported. Support for other streaming engines, including Spark Streaming, will be added in future releases. https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.0/bk_overview/content/ch_stream-analytics-overview.html https://www.slideshare.net/harshach/streaming-analytics-manager

egarelnabi · ‎08-02-2017

No, Atlas only works with Titan, not Janusgraph. So you cannot use DynamoDB as a datastore for Atlas.

egarelnabi · ‎08-02-2017

When you follow the links to the Github repo you'll see that AWS has built custom adapters to allow integration with JanusGraph not Titan. JanusGraph is a fork of the Titan project and has some differences.

egarelnabi · ‎08-01-2017

How are you trying to use the metadata? In most of our implementations we use the Atlas REST API (http://atlas.apache.org/api/v2/index.html ) for metadata/lineage import and export. Have you considered using that? Please note that I have linked above to the new API, the legacy API ( http://atlas.apache.org/api/rest.html ) has been deprecated with HDP 2.6 (Atlas 0.8) and will be removed in a future version.

egarelnabi · ‎08-01-2017

Please share a link to the source. There might be some confusion.

egarelnabi · ‎07-31-2017

@Al John Mangahas Distcp spins off MapReduce jobs on the cluster it is running on/from. You can use the Yarn UI on that cluster to monitor the job progress and utilization. Having said that, if you are copying from a Prod cluster to a DR cluster, and are worried about resource usage, then you can actually run the Distcp job on the DR cluster and have it "pull" the data from Prod. That way, the impact in terms of resources on Prod is minimal.

Online	Offline
Last Visited	‎08-14-2019 09:54 AM

Member Since	‎10-06-2015 09:21 PM
Last Visited	‎08-14-2019 09:54 AM
Posts	273
Kudos received	202

Cloudera Community

Re: Is it possible to import a complete new taxono...

Re: Is it possible in Apache Atlas to add key-valu...

Re: Do we have tag carry forward in atlas hdp2.6.1...

Re: With ATLAS, which format attribute Date is acc...

Re: Spark streaming support for stream analytics m...

Re: How to process CSV file based on info in an X...

How to process CSV file based on info in an XML m...

Re: Do we have tag carry forward in atlas hdp2.6.1...

Re: With ATLAS, which format attribute Date is acc...

Re: Spark streaming support for stream analytics m...

Re: How to use Atlas metadata with DynamoDB

Re: How to use Atlas metadata with DynamoDB

Re: How to use Atlas metadata with DynamoDB

Re: How to use Atlas metadata with DynamoDB

Re: Resource Utilization for Distcp