About egarelnabi

egarelnabi · ‎05-23-2017

@regie canada As Matt suggested below, use the SplitContent processor to split the file into multiple, smaller flow files. The "byte sequence" entry for splitting would be ####################################################################### START of Request ####################################################################### After that, use the ExtractText processor, as described in my response above, to get the first 4 lines of each flow file generated by the SplitContent processor.

egarelnabi · ‎05-23-2017

Take a look at the "Create Lineage amongst data sets" section (p. 46) in the document link I shared above. It also has a detailed example.

egarelnabi · ‎05-22-2017

@Dinesh Das Atlas does not currently provide out-of-box integration with components outside of the HDP stack. You can, however, create your own entities and use the REST API (or Kafka messages) to populate them. This, of course, will require some scripting/coding from you to push the data via REST. Here is some documentation and examples: http://atlas.apache.org/0.7.0-incubating/AtlasTechnicalUserGuide.pdf Please note that while this documentation also applies to Atlas 0.7-0.8 (in HDP 2.5-2.6), it does use APIs that have been deprecated in the latest version (HDP 2.6) and will be removed in future ones. Still, it's good to get you started with your implementation.

egarelnabi · ‎05-22-2017

@vnandigam You are correct, Atlas does not currently provide lineage for Spark. This is something engineering/community is working on. You can, however, create your own entities and use the REST API to populate them. Here is some documentation and examples: http://atlas.apache.org/0.7.0-incubating/AtlasTechnicalUserGuide.pdf Please note that while this documentation also applies to Atlas 0.7-0.8 (in HDP 2.5-2.6), it does use APIs that have been deprecated in that version and will be removed n future ones. Still, it's good to get you started with your implementation. As always, if you find any responses here useful, don't forget to "accept" an answer.

egarelnabi · ‎05-19-2017

@regie canada You can use the ExtractText processor and use regex within it to pull the first 4 lines. Your regex would be: (.*)\n(.*)\n(.*)\n(.*) After that you can use the SplitText processor if you want each line to be an individual flowfile or you can use the UpdateAttribute processor to make any kind of transformations on the 4 lines.

egarelnabi · ‎05-19-2017

@Robin Dong Hi Robin, Can you please close/delete this question since it's a duplicate of https://community.hortonworks.com/questions/103733/anyone-have-info-on-how-mongodb-do-the-sharding-wi.html Thanks

egarelnabi · ‎05-19-2017

@Robin Dong What repo did you use to install MongoDB? Have you seen this one:https://github.com/geniuszhe/ambari-mongodb-cluster

egarelnabi · ‎05-18-2017

@Farzaneh Poorjabar You need to enable the"Recursive" toggle for the policy to apply to child folders.

egarelnabi · ‎05-18-2017

I have a Nifi cluster consuming messages from Kafka and sending the output to a PostgresDB. How do I guarantee that each message will be consumed/processed exactly once?

egarelnabi · ‎05-17-2017

Thanks @mqureshi . I didn't realize ExecuteSQL used a connection pool.

Online	Offline
Last Visited	‎08-14-2019 09:54 AM

Member Since	‎10-06-2015 09:21 PM
Last Visited	‎08-14-2019 09:54 AM
Posts	273
Kudos received	202

Cloudera Community

Re: Is it possible to import a complete new taxono...

Re: Is it possible in Apache Atlas to add key-valu...

Re: Do we have tag carry forward in atlas hdp2.6.1...

Re: With ATLAS, which format attribute Date is acc...

Re: Spark streaming support for stream analytics m...

Re: Nifi Extraction

Re: Apache Atlas Spark Data lineage

Re: Can we connect Atlas with Sql table outside ha...

Re: Apache Atlas Spark Data lineage

Re: Nifi Extraction

Re: Anyone have info on how mongodb do the shardin...

Re: Anyone have info on how mongodb do the shardi...

Re: Ranger policy does not grant access as expecte...

How do we achieve exactly-once processing in Nifi?

Re: Can we reuse a DB connection in Nifi?