About Shu_ashu

Shu_ashu · ‎09-19-2017

@sally sally, i tried to merge flowfiles with same name and its working as expected. If you are thinking to merge files no matter of what size then increase the Minimum Group Size to more than 1 then the processor will wait for more than 1 flow file and merges them into 1. In your case Min group size is 10kb so the first flowfile is having size of 72kb the size is more than group size so it will be same flowfile. For your next flowfile is 5kb which is less than group size will wait for once it got reaches group size to 10kb. Once make sure you have used Merged relationship to get output of files as merged. Example:- my flow as follows in generateflowfile im using 222 as text and in update attribute i'm updating my filename to 2 if text is 222 and 3 if text is 333. MergeContent Config:- 1.in this processor we are having minimum group size as 10B so it will wait 10B as a groupsize based on filename, once group size is 10B it will merges those files and send them as merged file. 2.in my case im having just 3 B as every flowfile so this processor has waited for 4 flowfiles because we mentioned 10B as minimum group size, once it reaches 10B it has given all the merged contents as a Merged relationship 3.We need to connect Merged relationship to another processor(in mycase i connected to updateattribute) Input:- These flowfiles having filenames as 2 flowfile1:- 222 flowfile2:- 222 flowfile3:- 222 until this point the size is 9B still it hasn't reached to group size so it will wait for another flow file to get group size flowfile4:- 222 Output:- 222222222222

Shu_ashu · ‎09-18-2017

Hi @sally sally, You can do that by using invokeHTTP processor, once you make first service call then keep Response relationship to trigger next service. This way we can only triggers next service once we get response from previous service. Example:- In my below flow service 1 is triggered by GenerateFlowFile processor then i connected response relationship to trigger service2 InvokeHTTP processor. This service2 processor only triggers when it got response from service1 processor and keep in mind the response from service1 will be overwritten by response of service2.

Shu_ashu · ‎09-18-2017

@swathi thukkaraju, You can use the below serde properties to read your data correctly CREATE TABLE test(a string, b string,..) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"" ) STORED AS TEXTFILE location 'location of csv file';

Shu_ashu · ‎09-17-2017

Hi @Ben Vogan, you can make use of below jolt specification [ { "operation": "shift", "spec": { "parent": { "events": { "*": { "@(3,user_id)": "events[&1].user_id", "@(4,other_root_field)": "events[&1].other_root_field", "nested_1": "events[&1].exploded_nested_1", "nested_2": "events[&1].exploded_nested_2", "nested_3": "events[&1].exploded_nested_3" } } } } } ] input:- { "user_id": 123, "other_root_field": "blah", "parent": { "events": [ { "nested_1": "a", "nested_2": "b" }, { "nested_3": "c", "nested_1": "d" } ] } } output:- { "events" : [ { "user_id" : 123, "other_root_field" : "blah", "exploded_nested_1" : "a", "exploded_nested_2" : "b" }, { "user_id" : 123, "other_root_field" : "blah", "exploded_nested_1" : "d", "exploded_nested_3" : "c" } ] }

Shu_ashu · ‎09-16-2017

Hi @Rohit Ravishankar, yeah you will have impact on the flow. You are going to have 3 node cluster and thinking to use RPG after ListFile processor. let's consider you are having M01,M02,M03 are 3 NiFi nodes in the cluster and M01 is the Primary Node of the cluster. 1.So when ListFile processor runs then gives output to RPG, it is not guaranteed the file will goes to Primary node(M01). 2. RPG will take care of load balancing of nifi cluster and distributes the flowfiles accordingly. 3.If you are running ExecuteStreamCommand on Primary Node only, then it will triggers the command only if the flowfile will be on primary node at the time.in our assumption above processor will triggers the shell script only when flowfile will be on M01 node. 5.If RPG distributes the flowfile to M02(or)M03 nodes but the ExecuteStreamCommand processor is running on Primary Node only, in this cases those flow files won't triggers off the shell script.

Shu_ashu · ‎09-15-2017

Hi @sally sally, List Hdfs processor are developed as store the last state.. i.e when you configure ListHDFS processor you are going to specify directory name in properties. once the processor lists all the files existed in that directory at the time it will stores the state as maximum file time when it got stored into HDFS. you can view the state info by clicking on view state button. if you want to clear the state then you need to get into view state and click on clear the state. 2. so once it saves the state in listhdfs processor, if you are running the processor by scheduling as cron(or)timer driven it will only checks for the new files after the state timestamp. Note:- as we are running ListHDFS on primary node only, but this state value will be stored across all the nodes of NiFi cluster as primary node got changed, there won't be any issues regarding duplicates. Example:- hadoop fs -ls /user/yashu/test/ Found 1 items -rw-r--r-- 3 yash hdfs 3 2017-09-15 16:16 /user/yashu/test/part1.txt when i configure ListHDFS processor to list all the files in the above directory if you see the state of ListHDFS processor that should be same as when part1.txt got stored in HDFS in our case that should be 2017-09-15 16:16 it would be unix time in milliseconds when we convert the state time to date time format that should be Unixtime in milliseconds:- 1505506613479 Timestamp :- 2017-09-15 16:16:53 so the processor has stored the state, when it will run again it will lists only the new files that got stored after the state timestamp in to the directory and updates the state with new state time (i.e maximum file created in hadoop directory).

Shu_ashu · ‎09-15-2017

In Replace Text Processor set Search Value property as ^\[(.*)\]$ it will captures all the data except [] into group 1 then use Replacement Value property as $1 as a result from replace text processor you can get all the data except [] as a new replacement of the content. screenshots of configs:- Input to Replace Text:- [{"name":"Molecule Man","age":29,"secretIdentity":"Dan Jukes","powers":["Radiation resistance","Turning tiny","Radiation blast"]},{"name":"Madame Uppercut","age":39,"secretIdentity":"Jane Wilson","powers":["Million tonne punch","Damage resistance","Superhuman reflexes"]},{"name":"Eternal Flame","age":1000000,"secretIdentity":"Unknown","powers":["Immortality","Heat Immunity","Inferno","Teleportation","Interdimensional travel"]}] Output after Replace Text:- {"name":"Molecule Man","age":29,"secretIdentity":"Dan Jukes","powers":["Radiation resistance","Turning tiny","Radiation blast"]},{"name":"Madame Uppercut","age":39,"secretIdentity":"Jane Wilson","powers":["Million tonne punch","Damage resistance","Superhuman reflexes"]},{"name":"Eternal Flame","age":1000000,"secretIdentity":"Unknown","powers":["Immortality","Heat Immunity","Inferno","Teleportation","Interdimensional travel"]}

Shu_ashu · ‎09-15-2017

Hi @sally sally, as you said you are getting rid of [] in the flow file content but in my case i haven't get rid of [] you can use your existing functionality. After that use another replace text processor with search value property as (?<=]})(,) if you are having any spaces or new line characters before , then make use of below regex (?<=]})(\s*,) and replacement value property as shift+enter it will replaces your , after } with new line Input:- { "squadName": "Super hero squad", "homeTown": "Metro City", "formed": 2016, "secretBase": "Super tower", "active": true, "Data": { "row": [{ "name": "Molecule Man", "age": 29, "secretIdentity": "Dan Jukes", "powers": ["Radiation resistance", "Turning tiny", "Radiation blast"] }, { "name": "Madame Uppercut", "age": 39, "secretIdentity": "Jane Wilson", "powers": ["Million tonne punch", "Damage resistance", "Superhuman reflexes"] }, { "name": "Eternal Flame", "age": 1000000, "secretIdentity": "Unknown", "powers": ["Immortality", "Heat Immunity", "Inferno", "Teleportation", "Interdimensional travel"] }] }} Output:- [{"name":"Molecule Man","age":29,"secretIdentity":"Dan Jukes","powers":["Radiation resistance","Turning tiny","Radiation blast"]} {"name":"Madame Uppercut","age":39,"secretIdentity":"Jane Wilson","powers":["Million tonne punch","Damage resistance","Superhuman reflexes"]} {"name":"Eternal Flame","age":1000000,"secretIdentity":"Unknown","powers":["Immortality","Heat Immunity","Inferno","Teleportation","Interdimensional travel"]}] Replacetext configs:- In addition in future if you are expecting different flow files for each record make a use splitJson processor with JsonPath Expression property as $.Data.row connect the split property to another processor, it will results as in our array of data.row having 3 records so it will results 3 different flow files as a result from split json processor. Results after splitjson configs:- Output:- as we are having 3 flow files above and the contents of those flowfiles as follows. flowfile1:- {"name":"Eternal Flame","age":1000000,"secretIdentity":"Unknown","powers":["Immortality","Heat Immunity","Inferno","Teleportation","Interdimensional travel"]} flowfile2:- {"name":"Molecule Man","age":29,"secretIdentity":"Dan Jukes","powers":["Radiation resistance","Turning tiny","Radiation blast"]}flowfile3:- flowfile3:- {"name":"Madame Uppercut","age":39,"secretIdentity":"Jane Wilson","powers":["Million tonne punch","Damage resistance","Superhuman reflexes"]}

Shu_ashu · ‎09-09-2017

Hi @Praveen Singh,i got other solution for your case. 1. The results from ListDatabaseTables processor will have individual flowfiles with db.table.name, db.table.fullname attributes associated with each flow file. 2. You can take advantage of either of those attributes in RouteonAttribute processor by using the property as follows... ${db.table.name:in("tablename1","tablename2",.....,"tablename10")} and mention the list of required table names in the expression to filter out the required table names from the results of ListDatabaseTables processor. Flow screenshot:- RouteonAttribute configs:-

Shu_ashu · ‎09-07-2017

Hi @Sammy Gold, you can do that by using replacetext processor, 1.Follow the steps 1,2 in the below link to extract all your contents of csv file to attributes, https://community.hortonworks.com/questions/131332/nifi-convert-text-file-consisting-of-key-value-pai.html?childToView=131440#answer-131440 2.To acheive your case, follow step 3 and Use the all existing attribute associated with the flow file and use the same attribute again in ReplaceText Processor for the new field you want to add to the csv file. EX:- in my case i want to add caluculate field to csv file by using user_agent attribute, for this case i'm using ${user_agent} attribute twice in the replacetext processor replacement value property. ${user_agent}|${user_agent:toUpper()} Result: safari|SAFARI 3.Follow the steps 4,5 to convert the new csv file to avro and insert records into Hive.

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	515

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: How to merge flowfiles in nifi?

Re: Nifi:Schedule service Invokation

Re: how to load double quotes data of fields in h...

Re: NiFi: Explode a JSON array while keeping root ...

Re: Query on executing NiFi in a clustered setup w...

Re: Nifi:How does ListHdfs processor work?

Re: how can i replace ',' with new line in replac...

Re: how can i replace ',' with new line in replac...

Re: How can i add multiple TableNamePattern in Lis...

Re: add field to csv file based on other field val...