About gowthaman

gowthaman · ‎12-10-2018

I am planning to store date value in Distributed cache. The date field is present in the flowfile as well as attribute. Currently, I am using PutDistributedMapCache processor with "Cache Entry Identifier" = ${max_date} and "Cache update strategy" = "Replace if present". This max_date is present in flowfile as attribute. When I get the cache data using FetchDistributedMapCache with "Cache Entry Identifier" = $.max_date and "Put Cache Value In Attribute" = maxdate, in the maxdate attribute, I am getting the entire avro data instead of something like "20-10-1992 00:02:00". The flowfile content is also of format "Content Type:application/octet-stream" and I am unable to view the content. Could someone explain how to update an attribute or field into cache and again get only that attribute from cache instead of entire avro data? Thanks for your help in advance

gowthaman · ‎11-22-2018

I am trying to Join multiple tables using NiFi. The datasource may be MySQL or RedShift maybe something else in future. Currently, I am using ExecuteSQL processor for this but the output is in a Single flowfile. Hence, for terabyte of data, this may not be suitable. I have also tried using generateTableFetch but this doesn't have join option. Here are my Questions: Is there any alternative for ExecuteSQL processor? Is there a way to make ExecuteSQL processor output in multiple flowfiles? Currently I can split the output of ExecuteSQL using SplitAvro processor. But I want ExecuteSQL itself splitting the output GenerateTableFetch generates SQL queries based on offset. Will this slows down the process when the dataset becomes larger? Please share your thoughts. Thanks in advance

Online	Offline
Last Visited	‎12-11-2018 01:32 PM

Member Since	‎11-22-2018 01:42 PM
Last Visited	‎12-11-2018 01:32 PM
Posts	2

Cloudera Community

NiFi put data in Distributed Cache

Nifi joins using ExecuteSQL for larger tables