About Shu_ashu

Shu_ashu · ‎10-19-2017

Hi @xav webmaster, Use scancontent processor with DictionaryFile needs to be in all nifi nodes, in my case my file is in /tmp cat dict.txt ford So this processor looks into content and if it matched with the dictionary file then adds matching.term attribute to the flowfile. If your content of the flowfile is multiple lines then it will looks for all the lines if ford word is there then it adds matching.term attribute to the flowfile. Once you got matching.term attribute then use update attribute processor to change the name of the attribute (or) you can use same attribute. add new property kafka_topic ${matching.term} Configs:- after this processor matching.term attribute renamed to kafka_topic Another way to achieve this case is 1.ExtractText to get contents to attributes 2.UpdateAtrribute and add advance properties to check if the attribute extracted before having the required content by using NiFi expression Language 3. you will get same result as you got using scan content processor.

Shu_ashu · ‎10-18-2017

@dhieru singh, The error you shared is more generic one, it will be shown only when the processor is not able to run with the given configurations. Can you once make sure all the configurations you have used in the processor are correct i.e port information etc. Example:- lets for example i have shared configs for ListHDFS processor below. This processor requires Hadoop Configurations Resources to run but they are not Mandatory properties. if you won't mention hdfs-site.xml,core-site.xml in Hadoop Configurations Resources, this processor results same error as you shared above. Like this way you need to figure out what are all the configurations you need to mention in ListenUDP processor as they are not Mandatory properties. Those missing properties will also results to same error as you are having now.

Shu_ashu · ‎10-18-2017

Hi @rakesh chow, as ExecuteScripts are experimental in NiFi but you can acheive same results by using another processors as follows. 1.EvaluateJsonPath //Extract all the contents to attributes 2.UpdateAttribute //Update time stamps attributes with new format(yyyy-mm-dd hh...etc) 3.AttributesToJson //Recreate Json content by specifying all the attributes EvaluateJsonPath configs:- My input sample content is { "ID":"1", "CID":"1", "DiscoveredTime":"Mon Sep 11 19:56:13 IST 2017", "LastDiscoveredTime":"Mon Oct 09 23:38:55 IST 2017" } in eval json processor extract all the contents of the flowfile to the attributes of flowfile by changing Destination property to flowfile-attribute and add new properties by clicking + sign at right corner. ID $.ID DiscoveredTime $.DiscoveredTime CId $.CId LastDiscoveredTime $.LastDiscoveredTime so i have extracted all the content of ff as attributes of ff, in your case you need to add all your json content keys of contents here and this processor keeps all the values of content to attributes. Config Screenshot:- UpdateAttribute Processor Configs:- This processor will helps to update the attributes on the fly we are changing our DiscoveredTime,LastDiscoveredTime attributes on the fly. add new properties 1.DiscoveredTime ${DiscoveredTime:toDate('EEE MMM dd HH:mm:ss ZZZ yyyy'):toNumber():plus(21600000):format("yyyy-MM-dd HH:mm:ss.SSS")} in this expression we are checking which timezone it is and converting that to number and adding 6 hrs(i think you don't need to do :plus(21600000)), after that i'm converting to yyyy-MM-dd HH:mm:ss.SSS format Your expression probably will be ${DiscoveredTime:toDate('EEE MMM dd HH:mm:ss ZZZ yyyy'):toNumber():format("yyyy-MM-dd HH:mm:ss.SSS")} 2. add new property for LastDiscoveredTime ${LastDiscoveredTime:toDate('EEE MMM dd HH:mm:ss ZZZ yyyy'):toNumber():plus(21600000):format("yyyy-MM-dd HH:mm:ss.SSS")} Your expression:- ${LastDiscoveredTime:toDate('EEE MMM dd HH:mm:ss ZZZ yyyy'):toNumber():format("yyyy-MM-dd HH:mm:ss.SSS")} Configs:- AttributesToJson Processor:- in Attributes list property add all the available attributes it will prepare a json messages with the attributes that you have mentioned here. ID,CID,DiscoveredTime,LastDiscoveredTime Change Destination property to flowfile-content Configs:- Input:- { "ID":"1", "CID":"1", "DiscoveredTime":"Mon Sep 11 19:56:13 IST 2017", "LastDiscoveredTime":"Mon Oct 09 23:38:55 IST 2017" } Output:- { "ID" : "1", "LastDiscoveredTime" : "2017-10-09 23:38:55.000", "DiscoveredTime" : "2017-09-11 19:56:13.000", "CID" : "" } In your case you need to mention all your attributes in attributes to json processor and it will prepare new json content with all your attributes, if you left this property as empty then all the attributes associated with the ff will be part of your json content. Flow Screenshot:-

Shu_ashu · ‎10-17-2017

@dhieru singh yeah you can change the format to below ${now():format("yyyyMM")} will gives you only year and month. In addition you make use any combination in EL in date manupulations with now():- yyyy //gives only the year MM //month dd //date HH//hours mm // minutes from time (MM are for month) ss // seconds SSS //milliseconds Examples:- 1.year with month ${now():format("yyyy-MM")} output:- 2017-10 2.Hour with minute ${now():format("HH:mm")} Output:- 16:17 3.year with minutes ${now():format("yyyymm")} output:- 201713 //13 because minute from current time stamp 4.year with date ${now():format("yyyydd")} output:- 201717 //year with today's date

Shu_ashu · ‎10-17-2017

@dhieru singh in update attribute processor add new property by clicking + sign Name of property:- filename value of the property:- <what ever name you want to keep for file> If you want to append with timestamp then use EL <new-file-name>_${now():format("yyyyMMddHHmmss")} Example:- if my input filename is foo and i want to change the filename to bar_timestamp then my UpdateAttribute processor new property would be filename bar_${now():format("yyyyMMddHHmmss")} output filename would be bar_20171017132115

Shu_ashu · ‎10-17-2017

@Kiem Nguyen you can use RouteonContent processor and change Match Requirement property to Content must contain Match and add the below properties 1.project_1 as .*project_1.* 2.project_2 as .*project_2.* This processor will checks the content of flowfile if it is having project_1 in it it transfers the ff to project_1 relationship, if the content having project_2 then it transfers to project_2 relationship. Flow Screenshot:- As shown in the above screenshot you can use the relationships project_1,project_2 and store or process those content as per your requirements.

Shu_ashu · ‎10-17-2017

Hi @xav webmaster, i think this answer will help you for sure, Sample Flow :- GenerateFlowFile--> ExtractText-->UpdateAttribute-->PublishKafka GenerateFlowFile:- As for testing purpose i'm using this but in your case you are having some other processors ExtractText Processor:- in this processor i'm extracting the contents of flowfile as attribute. Ex:- adult,dog,bulldog,9,23,male,brown,4,etc The above content of flowfile by adding new property to ExtractText i'm going to extract the content and keeping that as attribute of the flowfile cnt_attr as (.*) //capture everything and add to ff as cnt_attr Configs:- Output of this processor:- Every flowfile will associated with the attribute called cnt_attr to it, we can use this attribute in UpdateAttribute Processor:- To dynamically change the topic names based on the cnt_attr attribute, for this case we need to use Advanced Usage of Update Attribute processor. Right Click on UpdateAttribute processor and click on Advanced Button in the lower right corner. Steps:- open above screenshot in new tab to see 1,2,3,4 steps and refer them with below steps 1. As mentioned in the above screenshot click on FlowFile Policy change to UseOriginal 2. Click on + sign at Rules and give name as adult_dog 3. Click on + sign at Conditions and give our check condition in it ${cnt_attrt:matches('.*adult.*dog.*')} 4. Click on + sign at Actions and give the attribute name as kafka_topic Value for the kafka_topic attribute as adult_dog New Rule:- for cat_dog conditions check is ${cnt_attr:matches('.*cat.*dog.*')} and Actions add attribute name as kafka_topic and value is cat_dog same as 2,3,4 steps above. summarize all the steps:- step 1 we are using original flowfile and step2 we are creating a rule and step3 adding conditions to check if the cnt_attr attribute satisfies or not step4 if it satisfies then adding kafka_topic attribute with desired name to it. like this way we can add as many rules as we want in same UpdateAttribute Processor as you can see in my screenshot i have added 2 Rules(adult_dog,cat_dog). This processor checks which Rule has satisfied and updates kafka_topic attribute with the mentioned name in it. PublishKafka:- use the kafka_topic attribute in Topic Name property of processor ${kafka_topic} Flow Screenshot:- In this way we can use only one UpdateAttribute to dynamically change the value of kafka_topic based on update attribute processor and use same kafka_topic attribute to publish messages to respective topics.

Shu_ashu · ‎10-16-2017

@Ramya Jayathirtha, As i'm having id,name,age columns in foo table when ever we does Hive# select name from foo; //in this case first map phase will loads the file and we only selected name column, we are not doing any filtering kind of things here so map phase checks name field and gives results. MapSideJoins:- Usually all joins will perform on reducer side as we can explicitly mention load tables to memory and performs joins, no reducer phase will be initialized. Hive# select /*+MAPJOIN(..)*/... //this kind of joins will loads small table to memory and does the join on map phase only. When ever we do insert values into table and loading the data should be used only map phase. Hive# insert into foo values(1,'abc',200); INFO : Map 1: -/- INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 1/1 INFO : Table default.foo stats: [numFiles=5, numRows=5, totalSize=38, rawDataSize=33] Simple CTAS without Aggregations:- When we does Create table as simple select then only mapper phase will be initialized. if we does any aggregations then reducer phase will get initialized Hive#create table foo1 stored as orc as select * from foo INFO : Map 1: -/- INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 1/1 INFO : Table default.foo1 stats: [numFiles=1, numRows=4, totalSize=XXX, rawDataSize=XXXX] No rows affected (10.247 seconds) Hive#select * from foo1; +----------+------------+-----------+--+ | foo1.id | foo1.name | foo1.age | +----------+------------+-----------+--+ | 1 | a | 10 | | 2 | a | 11 | | 2 | a | 10 | | 3 | b | 10 | | 4 | b | 10 | | 5 | c | 10 | +----------+------------+-----------+--+ 6 rows selected (0.205 seconds) 2. if we does CTAS with where clause in it still it is just map phase all the filters in WHERE clause are going to be done by mapper phase it self. Hive#create table foo as select * from foo1 where id='1'; INFO : Map 1: -/- INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 1/1 INFO : Table default.foo stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6] No rows affected (9.984 seconds) Hive#SELECT * FROM FOO; +---------+-----------+----------+--+ | foo.id | foo.name | foo.age | +---------+-----------+----------+--+ | 1 | a | 10 | +---------+-----------+----------+--+ 1 row selected (0.099 seconds)

Shu_ashu · ‎10-16-2017

@sally sally can you change the property to //localAttributes[@name='rs'] it will gives you only <?xml version="1.0" encoding="UTF-8"?><localAttributes name="rs"> <start>2017-09-07</start> <startDate>2017-02-02</startDate> <endDate>2017-03-02</endDate> <runAs>true</runAs> <patch>this is patch</patch> <makeVersion>1</makeVersion> </localAttributes> (or) if name=rs is first in value every time in local attribute node then you can use //localAttributes[1] will results the same output.

Shu_ashu · ‎10-16-2017

@xav webmaster We can do that by using RouteonContent processor followed by UpdateAttribute processor. Routeoncontent:- change Match Requirement strategy to content must contain match add the properties adult_dog as (.*adult.*dog.*) cat_dog as (.*cat.*dog.*) Config:- Right now we are checking the contents of flowfile and add new properties for all the cases that you need to publish the data to kafka, then use UpdateAttribute processor: Add your deserved name to kafka_topic attribute of the flowfile. add new property kafka_topic as cat_dog Configs:- PublishKafka:- every flowfile will have only kafka_topic as the attribute having different topic names in it, make use of kafka_topic in our publish kafka processor. Change Topic Name property to ${kafka_topic} Configs:- Flow screenshot:- Flow Explanation:- Routeoncontent //check the content and transfers flowfile to matching relationships. UpdataAttribute //adding kafka_topic attribute with some name. PublishKafka //using kafka_topic attribute and publishing the contents to respective topics dynamically. Like this way we don't have to use any scripts and dynamically publish into kafka topics.

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: how to generate attributes using executescript...

Re: LIstenUDP Error failed to invoke @OnScheduled ...

Re: Groovy Script in ExecuteScript Processor To Fo...

Re: Restrict the bar_${now():format("yyyy/MM/dd/HH...

Re: Change the filename after putHDFS processor to...

Re: How to capture both key and value of json data...

Re: How to generate a variable under conditions fo...

Re: Hive queries use only mappers or only reducers

Re: NIFI:How to get node value using variable node...

Re: How to generate a variable under conditions fo...