About ahadjidj

ahadjidj · ‎11-08-2017

Hi @Salda Murrah You can set the "Compression format" to "use mime.type". This way, the processor will look for an attribute called mime.type and dynamically infer the format and hence the decompression algorithm. For this to work, you need to use an UpdateAttribute to add an attribute mime.type and set it's value following your logic. Keep in mind that UpdateAttribute have rules logic in the advanced configuration that can be useful for your use case : https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-update-attribute-nar/1.4.0/org.apache.nifi.processors.attributes.UpdateAttribute/additionalDetails.html

ahadjidj · ‎11-07-2017

Hi @Michael Jonsson NiFi is not packaged in HDI this is why you can not find it in Ambari from Add Service. You can install an HDP + HDF cluster on Azure in IaaS mode to have both platforms. You can also provision Azure VMs and install Ambari + HDF only, and use it to with HDI. This way you have two separate clusters HDF and HDI. You can also use Cloudbreak for easier installation. Theoretically, you should be able to manually install NiFi on HDI nodes but this won't be supported neither manager by Ambari (no monitoring, configuration, upgrade, etc ..). So this may make sense for testing/POC. I've never tried it though.

ahadjidj · ‎11-07-2017

Hi @dhieru singh QueryDatabaseTable query the database at the defined schedule. Even if you don't custom the scheduling of the processor, there's one by default in the scheduling tab. The processor is intended to be used on Primary only to avoid ingesting data several times. This processor doesn't accept an incoming connection so you can not customize it dynamically with previous flow parts. So if you deploy it at all nodes, each node will ingest the exact same data (data duplication)

ahadjidj · ‎11-07-2017

Hi @pranayreddy bommineni Have you seen this article that describe how to use NiFi Rest API to add a processor and the configure it ? https://community.hortonworks.com/articles/87217/change-nifi-flow-using-rest-api-part-1.html Look to step 7 for configuration only.

ahadjidj · ‎11-06-2017

Introduction This is part 3 of a series of articles on Data Enrichment with NiFi: Part 1: Data flow enrichment with LookupRecord and SimpleKV Lookup Service is available here Part 2: Data flow enrichment with LookupAttribute and SimpleKV Lookup Service is available here Part 3: Data flow enrichment with LookupRecord and MongoDB Lookup Service is available here Enrichment is a common use case when working on data ingestion or flow management. Enrichment is getting data from external source (database, file, API, etc) to add more details, context or information to data being ingested. In Part 1 and 2 of this series, I showed how to use LookupRecord and LookupAttribute to enrich the content/metadata of a flow file with a Simple Key Value Lookup Service. Using this lookup service helped us implement an enrichment scenario without deploying any external system. This is perfect for scenarios where reference data is not too big and don't evolve too much. However, managing entries in the SimpleKV Service can become cumbersome if our reference data is dynamic or large. Fortunately, NiFi 1.4 introduced a new interesting Lookup Service with NIFI-4345 : MongoDBLookupService. This lookup service can be used in NiFi to enrich data by querying a MongoDB store in realtime. With this service, your reference data can live in a MongoDB and can be updated by external applications. In this article, I describe how we can use this new service to implement the use case described in part 1. Scenario We will be using the same retail scenario described in Part 1 of this series. However, our stores reference data will be hosted in a MongoDB rather than in the SimpleKV Lookup service of NiFi. For this example, I'll be using a hosted MongoDB (BDaaS) on MLab. I created a database "bigdata" and added a collection "stores" in which I inserted 5 documents. Each Mongo document contains information on a store as described below: { "id_store" : 1, "address_city" : "Paris", "address" : "177 Boulevard Haussmann, 75008 Paris", "manager" : "Jean Ricca", "capacity" : 464600 } The complete database looks like this: Implementation We will be using the exact same flow and processors used in part 1. The only difference is using a MongoDBLookupService instead of SimpleKVLookupService with Lookup record. The configuration of the LookupRecord processor looks like this: Now let's see how to configure this service to query my MongoDB and get the city of each store. As you can see, I'll query MongoDB by the id_store that I read from each flow file. Data enrichment If not already done, add a MongoDBLookupService and configure it as follows: Mongo URI: the URI used to access your MongoDB database in the format mongodb://user:password@hostname:port Mongo Database Name : the name of your database. It's bigdata in my case Mongo Collection Name : the name of the collection to query for enrichment. It's stores in my case SSL Context Service and Client Auth : use your preferred security options Lookup Value Field : the name of the field you want the lookup service to return. For me, it's address_city since I am looking to enrich my events with the city of each store. If you don't specify which field you want, the whole Mongo document is returned. This is useful if you want to enrich your flow with several attributes. Results To verify that our enrichment is working, let's see the content of flow files using the data provenance feature in our global flow. As you can see, the attribute city has been added to the content of my flow file. The city Paris has been added to Store 1 which correspond to my data in MongoDB. What happened here is that the lookup up service extracted the id_store which is 1 from my flow file, generated a query to mongo to get the address_city field of the store having id_store 1, and added the result into the field city in my new generated flow files. Note that if the query has returned several results from Mongo, only the first document is used. By setting an empty Lookup Value Field, I can retrieve the complete document corresponding to the query { "id_store" : "1" } Conclusion Lookup services in NiFi is a powerful feature for data enrichment in realtime. Using Simple Key/Value lookup service is straightforward for non-dynamic scenarios. In addition, it doesn't require external data source. For more complex scenarios, NiFi started supporting lookup from external data source such as MongoDB (available in NiFi 1.4) and HBase (NIFI-4346 available in NiFi 1.5).

ahadjidj · ‎11-04-2017

@Andre Labbe If you found that this answer addressed your question, please take a moment to click "Accept" below.

ahadjidj · ‎11-04-2017

Hi @manisha jain You can have several users work simultaneously on the UI and it will refresh automatically. You can also organise your flow by process groups to make easier management and edition of your flow files

ahadjidj · ‎11-04-2017

Hi @manisha jain The approach described above (Avro -> Json -> Avro) is no more required with new record based processors in NiFi. You can use UpdateRecord processor to add new attribute to your flow files whatever their format is (Avro, JSON, CSV, etc). Just define your Avro Schema with the new attribute and use in Avro Writter. Then use UpdateAttribute to add the value as in the below example where I add city attribute with the static value Paris: And the result will look like Note that in my example data is in JSON format but the same approach works for Avro. Just use Avro reader/writer instead of Json. If you are new to record based processors read these two articles: https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi https://community.hortonworks.com/articles/138632/data-flow-enrichment-with-nifi-lookuprecord-proces.html I hope this is helpful.

ahadjidj · ‎11-02-2017

@Wesley Bohannon PFA the template enrichlookuprecord.xml

ahadjidj · ‎11-02-2017

@pranayreddy bommineni You can add limit 1 to your SQL query with ExecuteSQL to get only one row for schema inference

Online	Offline
Last Visited	‎08-19-2019 05:07 AM

Member Since	‎01-11-2016 06:11 PM
Last Visited	‎08-19-2019 05:07 AM
Posts	355
Kudos received	232

Cloudera Community

Re: How to access NIFI Process Group variable in E...

Re: GETSFTP with NiFi cluster

Re: how is Kafka different from Mosquitto(MQTT) ?

Re: Whitelisting using LookupAttribute

Re: Is there any ways if we can schedule or trigge...

Re: What is the best way to decompress/extract dif...

Re: How to install NiFi on Ambari on Azure HDInsig...

Re: QueryDatabaseTable internal workings

Re: Setting Processor Properties dynamically and ...

Data flow enrichment with NiFi part 3: LookupRecor...

Re: How to prevent nifi-app logs from filling my d...

Re: Nifi - Is there a way to add a new work area i...

Re: Nifi : QueryDatabaseProcessor - Add new fields...

Re: Data flow enrichment with NiFi part 1 : Lookup...

Re: Need to get Table schema from database using n...