About Shu_ashu

akhilbaby00 · ‎04-27-2019

Shu, thanks for the answer. But I was wondering when do we need to schedule at process group level ? Can't we control it by scheduling the first processor in process group. I am new to NiFi, and just started exploring it. So, may be it's a silly question.

Shu_ashu · ‎09-22-2017

Hi @Simon Jespersen, in your evalJsonpath processor you are using Path Not Found Behavior property as warn i.e it will generate a warning when a JSON path expression is not found, as in your csv file you are for some of the records wont have any data for zip. This warn message wont effect your flowfile, flowfile still routes to success relationship with all the available content will be extracted as attributes and for no content attributes values will be Empty string set. if you don't want to see those warn messages on the processor then just change Path Not Found Behavior property to ignore(default) which will ignore if the content is not found for any of the processor. Example:- i have recreated same WARN message as you are having with the below Json doc { "name" : "else", "adresse" : "route 66", "by" : "Hadoop City" <br>} with ignore property This is my json doc to evaljson processor with ignore as path not found property processor wont return any warn messages as it ignore if there is no content for the jsonpath expression. With warn property:- If you change path not found property to warn processor will return the same warn message as you are having in the question. both cases results the same output as zip attribute value is Empty string set and routes to Success relation.

shailuk · ‎10-09-2017

@Shu It worked man! Thank you so very much:) LoL. One last thing, What if the my table has more number of columns lets say 15-20 columns. Do we still have to hard code all the column names in the processors.? eg - In the insert statement (in replaceText processor) where we do insert into table_name value (${column1},${column2}................) Is there a way that nifi can do this dynamically? I mean if there are 20-25 columns in my table then it would be a pain to mention all the column names in the insert statement. Can I use regex or something.?

Shu_ashu · ‎09-24-2017

@sally sally, i think if you need to merge files based on filename then you have to use those many merge content processors(e.g if 100 filenames you need to have 100 merge contents). Can you please share more details about your merging strategy on what basis you are merging the flow files is it based on size(or) some thing else? and can also share us the configs that you are using now to merge content based on filename?

Shu_ashu · ‎09-19-2017

@sally sally, can you make use of below search property ^<[^>]+>(.*)\<\/\?.*\>$ Replacetext Configs:- Input:- <?xml version="1.0" encoding="utf-8"?>abc</?xml version="1.0" encoding="utf-8"?> Output:- <DailyData>abc</DailyData>

MattWho · ‎09-19-2017

@sally sally By setting your minimums (Min Num Entries and Min Group Size to some large value), FlowFiles that are added to a bin will not qualify for merging right away. You should then set "Max Bin Age" to a unit of time you are willing to allow a bin to hang around before it is merged regardless of the number of entries in that bin or that bins size. As far as the number of bins go, a new bin will be created for each unique filename found in the incoming queue. Should the MergeContent processor encounter more unique filenames then there are bins, the MergeContent processor will force merging of the oldest bin to free a bin for the new filename. So it is important to have enough bins to accommodate the number of unique filenames you expect to pass through this processor during the configured "max bin age" duration; otherwise, you could still end up with 1 FlowFile per merge. Thanks, Matt

Shu_ashu · ‎09-18-2017

Hi @sally sally, You can do that by using invokeHTTP processor, once you make first service call then keep Response relationship to trigger next service. This way we can only triggers next service once we get response from previous service. Example:- In my below flow service 1 is triggered by GenerateFlowFile processor then i connected response relationship to trigger service2 InvokeHTTP processor. This service2 processor only triggers when it got response from service1 processor and keep in mind the response from service1 will be overwritten by response of service2.

zfanswer · ‎01-08-2018

Does it support load gzipped csv file? I got `FAILED: SemanticException Unable to load data to destination table. Error: The file that you are trying to load does not match the file format of the destination table.`

ben2 · ‎09-17-2017

@Yash thanks for your reply. However my problem is that I do not know the set of fields (either on the root, or inside the array elements) - it is always changing and I don't want to have to update the spec every time someone adds a field. The spec should only know about parent.events and not assume the existence of any other field. I need a way to say "copy everything at the root, except for the parent field." What I've done for the moment is just implemented the logic in Jython - although it is fairly slow.

Shu_ashu · ‎09-15-2017

Hi @sally sally, List Hdfs processor are developed as store the last state.. i.e when you configure ListHDFS processor you are going to specify directory name in properties. once the processor lists all the files existed in that directory at the time it will stores the state as maximum file time when it got stored into HDFS. you can view the state info by clicking on view state button. if you want to clear the state then you need to get into view state and click on clear the state. 2. so once it saves the state in listhdfs processor, if you are running the processor by scheduling as cron(or)timer driven it will only checks for the new files after the state timestamp. Note:- as we are running ListHDFS on primary node only, but this state value will be stored across all the nodes of NiFi cluster as primary node got changed, there won't be any issues regarding duplicates. Example:- hadoop fs -ls /user/yashu/test/ Found 1 items -rw-r--r-- 3 yash hdfs 3 2017-09-15 16:16 /user/yashu/test/part1.txt when i configure ListHDFS processor to list all the files in the above directory if you see the state of ListHDFS processor that should be same as when part1.txt got stored in HDFS in our case that should be 2017-09-15 16:16 it would be unix time in milliseconds when we convert the state time to date time format that should be Unixtime in milliseconds:- 1505506613479 Timestamp :- 2017-09-15 16:16:53 so the processor has stored the state, when it will run again it will lists only the new files that got stored after the state timestamp in to the directory and updates the state with new state time (i.e maximum file created in hadoop directory).

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	515

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: NIFI process group scheduling

Re: nifi EvaluateJsonPath could not find path in j...

Re: I used GetFTP processor to get a CSV file from...

Re: NIFI:Merging Flowfiles by filename in MergeCon...

Re: Nifi:How to remove xml tag from xml response d...

Re: How to merge flowfiles in nifi?

Re: Nifi:Schedule service Invokation

Re: how to load double quotes data of fields in h...

Re: NiFi: Explode a JSON array while keeping root ...

Re: Nifi:How does ListHdfs processor work?